Architect AI: Hackathon Submission Details

Inspiration

The inspiration came from a singular, painful reality: diagrams die the moment they are drawn.

In modern software development, architectural diagrams are critical for aligning teams, but they suffer from three major flaws:

  1. High Barrier to Entry: Tools like Draw.io or Lucidchart require manual drag-and-drop, making it tedious to iterate on complex ideas.
  2. Disconnect from Reality: Static diagrams fail to capture the dynamic flow of data, making it hard for non-technical stakeholders to understand how a system actually works.
  3. Synchronization Hell: Keeping diagrams updated as code evolves is a constant battle.

We asked: What if you could just talk to your architecture? What if an AI could not only draw the system but explain it to you like a movie director?

What it does

Architect AI is an intelligent system design engine that transforms natural language into professional, interactive cloud architecture diagrams. It’s not just a drawing tool; it’s an active collaborator.

  • Conversational Design: Users describe their system (e.g., "I need a scalable e-commerce backend with Redis caching and an event bus"), and Architect AI instantly scaffolds the entire diagram with correct icons and connections.
  • Auto-Pilot Mode: A hands-free mode where the AI proactively suggests and implements improvements to your architecture in real-time, simulating a pair-programming session with a senior architect.
  • Cinematic Replay Engine: The standout feature. It turns a static diagram into a video-like walkthrough. The AI generates a "Director's Script," and the engine plays it back, panning and zooming across the canvas while narrating the technical data flow or a simplified user story.
  • Smart Export: Users can export their designs as high-fidelity images or download the entire chat history as a technical specification document.

How we built it

We built Architect AI by leveraging the cutting-edge capabilities of the Google Gemini 3 Flash model (via the Google AI Studio SDK) and Firebase.

  1. Gemini 3 Flash: This is the brain of the operation. We chose Flash for its incredible speed and massive context window. It handles three critical tasks:

    • Intent Parsing: It breaks down complex user prompts into structured JSON instructions for our React Flow canvas (adding nodes, creating edges, updating labels).
    • System Logic: It understands cloud patterns (AWS/GCP/Azure) to ensure meaningful connections (e.g., placing a Cache before a Database).
    • Cinematic Scripting: We use a specialized prompt to force Gemini to act as a "Technical Director," analyzing the visual graph and outputting a timed script for our replay engine.
  2. Tool Calling & The Design Lifecycle: We didn't just ask Gemini for text; we gave it a set of precision architectural tools that allow it to manipulate the UI state with surgical accuracy.

    • Proposing vs. Committing: When Gemini calls propose_node, the system enters a "Pending State." The node isn't just dropped on the canvas; it's staged as an activeProposal. This allows the user to ask questions about the choice before it's finalized. Once confirmed via handleConfirm, the node's status flips to COMMITTED, and it's permanently added to the system graph.
    • Validation & Spatial Intelligence: The propose_node tool includes relative_to_id and position_hint parameters. This forces the AI to think spatially, anchoring new components to existing ones.
    • Connection Integrity: The propose_connection tool requires valid from_id and to_id parameters. This enforces a strictly logical build order—the AI cannot connect components that haven't been committed yet, mimicking a real-world engineering workflow.
  3. Real-time Layout Engine (Dagre): While React Flow handles the rendering, the layoutService.ts uses the Dagre graph library to calculate optimal coordinates. It parses the AI's spatial hints and system topology to generate non-overlapping, hierarchical layouts (typically Left-to-Right), ensuring that even 50-node microservice architectures remain readable.

  4. The Cinematic Replay Engine (Gemini 3.0 Flash + Narrative Logic):

    • Trigger: When the user clicks "Cinematic View," the system captures a canvas snapshot via html2canvas and parses the current graph state.
    • Generation: The videoService.ts sends the entire system JSON to Gemini 2.0 Flash. A specialized prompt forces the model to act as a "Technical Director," generating a dual-mode narrative script.
    • Dual-Mode Scripts: Gemini produces a technicalScript (deep architectural insights) and a simpleScript (metaphor-driven domain stories) for every single node and edge in the history.
    • Playback: The UI then reconciles these scripts with the project's history, animating each component's entry while displaying the AI-generated narrative in a stylish CRT-themed HUD.
  5. Firebase Firestore: We implemented a real-time persistence layer. Every committed architectural change is synced to a unique session in Firestore, ensuring persistence across reloads and multi-device access.

Challenges we ran into

  • Visual Hallucination: Early on, the AI would suggest connections that looked good textually but created visual chaos (overlapping nodes). We solved this by integrating a collision detection algorithm and the Dagre layout engine to "sanitize" the AI's placement logic.
  • Prompt Engineering for "Directors": Getting an LLM to write a script that matches the timing of a visual animation was tough. We had to refine the system prompts to enforce strict JSON schemas for the "Cinematic Script," separating the technical voiceover from the visual_target.
  • Rate Limiting: With "Auto-Pilot" mode firing rapid requests, we hit API limits initially. We implemented a smart queuing system and exponential backoff to handle the model's throughput limits gracefully.

Accomplishments that we're proud of

  • The "Cinematic" Feel: Watching the application "play" the diagram—highlighting a Database as it explains data persistence—feels magical. It turns a technical chore into a storytelling experience.
  • Zero-UI Friction: You can build a complex 50-node microservice architecture without touching a mouse. The conversational interface is robust enough to handle corrections ("No, move the cache to the edge") naturally.

What we learned

  • Latency is UX: For a tool that "draws" with you, speed is everything. Gemini 3 Flash was the only model fast enough to make the "Auto-Pilot" feel like a real-time conversation rather than a turn-based game.
  • Structured Output is Key: Reliable JSON schema enforcement from the model was the difference between a broken app and a produciton-ready tool.

What's next for Architect AI

  • Code Generation: The next logical step is to turn these diagrams into actual Infrastructure-as-Code (Terraform/Pulumi).
  • Visual Input: Using Gemini 3 Pro's vision capabilities to let users upload a hand-drawn whiteboard sketch and have Architect AI convert it into a digital diagram instantly.
  • Collaborative Multiplayer: Leveraging our Firestore backend to allow multiple users to edit the canvas simultaneously with the AI.

Built With

Share this project:

Updates