Inspiration
Designing cloud infrastructure is genuinely intimidating, and getting feedback on it is even harder. Junior developers are constantly stuck between two bad options: wait days for a senior engineer to carve out time for a review, hoping not to get humiliated over a rookie mistake, or paste their work into a generic AI tool that demands a clunky screenshot-upload routine and still manages to hallucinate half its advice. We wanted to tear that whole process down and rebuild it as something faster, smarter, and honestly a lot more fun — an interactive feedback loop that feels less like a linter and more like having a brutally honest mentor looking over your shoulder in real time.
What it does
ArchBot is an embedded, voice-enabled AI copilot that lives right inside your browser and plays the role of a hardened, deeply sarcastic senior cloud architect. It keeps a constant eye on your draw.io canvas, and the moment you hit save, it launches into a full semantic analysis of your infrastructure diagram. Rather than handing you a dry warning about a missing load balancer or an exposed database endpoint, it streams back a real-time, emotionally expressive audio roast — catching architectural mistakes the instant they're made, during the design phase, before they ever get close to production. Teams save hours of painful rework, and the learning actually sticks because, well, nobody forgets getting roasted.
How we built it
Under the hood, ArchBot runs on a tightly orchestrated multi-modal pipeline. The frontend is built with React and Vite and embeds a live draw.io iframe. The moment a user saves their diagram, the frontend takes a screenshot and ships it off to a FastAPI backend. From there, Google Cloud Vision API sweeps through the image extracting labels and on-screen text, which gets merged with draw.io's raw XML data to give the system a genuinely rich understanding of the visual logic — not just the structure, but everything rendered on screen. That enriched metadata is then fed into Snowflake Cortex, where a RAG pipeline cross-references the design against thousands of pages of real cloud architecture documentation in milliseconds. The grounded, authoritative engineering analysis that comes out of that is then routed to an ElevenLabs Conversational AI agent through webhook, with the sarcastic feedback streaming back to the user over WebSockets almost before they've lifted their finger off the save key.
Challenges we ran into
The hardest problem we faced was the fundamental gap between a visual medium and a text-based language model. draw.io's XML is useful, but it's incomplete — it misses unlabeled icons, freehand annotations, and anything that only exists visually on the canvas. We solved this by layering Google Cloud Vision on top of the XML parse, letting the two sources of information fill in each other's blind spots and creating a much more robust enrichment pipeline. The other major challenge was coordinating three distinct external APIs simultaneously — Cloud Vision, Snowflake Cortex, and ElevenLabs — while keeping latency low enough that the audio streams back in a way that feels instant and alive rather than sluggish and mechanical. That required careful backend orchestration and a lot of deliberate state management to get right.
Accomplishments that we're proud of
We're genuinely proud of how far left we managed to shift the infrastructure review. Catching flaws at the diagramming stage, before a single line of deployment code is written, is a meaningful change to how teams can work. Technically, getting the Snowflake Cortex RAG pipeline to work cleanly with visual metadata from Cloud Vision was a real accomplishment — it means ArchBot isn't guessing, it's actually reasoning against established cloud standards. But honestly, what we're most proud of is the persona. We took something that's traditionally dry and stressful, and made it into an experience people genuinely laugh through. That combination of entertainment and education is harder to pull off than it sounds.
What we learned
Building ArchBot taught us that personality is a legitimate product feature, not a nice-to-have. Negative reinforcement, when it's delivered with the right tone, turns out to be a surprisingly effective teaching tool. On the technical side, we came away with a much deeper understanding of multi-modal AI orchestration — how to use Snowflake's REST APIs for zero-ops RAG, how to build image parsing pipelines that degrade gracefully when Vision hits its limits, and how to manage real-time WebSocket audio in a way that makes an AI feel less like a system and more like a personality.
What's next for ArchBot
We want to take ArchBot well beyond draw.io, adding support for tools like Excalidraw and Lucidchart so more teams can use it without changing how they work. We're also planning an Auto-Fix mode where ArchBot doesn't just tell you what's wrong — it generates corrected XML and updates your canvas on the spot. And further down the road, we'd love to introduce swappable personas, so when you eventually tire of the sarcastic senior dev, you can have your architecture torn apart by a panicked security auditor or a penny-pinching cloud finance manager instead.
Built With
- ai/llm:
- api
- cloud
- cortex
- database/data
- diagram
- draw.io
- elevenlabs
- httpx
- javascript-frontend:-react
- languages:-python
- mermaid-backend:-fastapi
- platform:
- snowflake
- text-to-speech
- uvicorn
- vision
- vision/ocr:
- vite
- voice:
- ws

Log in or sign up for Devpost to join the conversation.