Whiteboard architecture sessions are where critical design decisions happen — yet they rarely get expert review in real time. Flaws in security, scalability, or reliability often surface weeks later when fixing them is 10x more expensive. We asked: what if an AI senior architect could join every whiteboard session, watching and listening, just like a real colleague?
What it does
Whiteboard Architect is a real-time AI architecture reviewer. Point your camera at a whiteboard, talk through your design, and Archie — an AI senior cloud architect — provides instant voice feedback.
- Sees your diagrams through the camera and understands them as you draw
- Listens to your explanations via microphone
- Responds with real-time voice feedback on security, scalability, reliability, cost, and operations
- Interrupts naturally — barge-in support lets you cut in mid-sentence, just like a real conversation
- Annotates your whiteboard with visual markers highlighting issues and components
- Generates diagrams — converts hand-drawn sketches into clean SVG diagrams
- Records findings as structured review notes with severity levels and recommendations
How we built it
We built a multi-model architecture using four Gemini models, each optimized for its task:
| Model | Role |
|---|---|
| Gemini 2.5 Flash Native Audio | Live bidirectional voice + vision conversation |
| Gemini 3.1 Flash Lite | Background whiteboard analysis & English→Japanese translation |
| Gemini 2.0 Flash | SVG diagram generation from sketches |
The backend (FastAPI + Google ADK) runs four parallel async tasks per WebSocket connection:
- Upstream — routes client audio/video to Gemini Live API via
LiveRequestQueue - Downstream — streams Gemini responses back to the client
- Recovery — monitors session health and detects false barge-ins
- Perception — runs periodic deep analysis of the whiteboard
The frontend (Next.js 16) uses custom React hooks for AudioWorklet-based capture (PCM16 16kHz), gapless audio playback (PCM 24kHz), and canvas-based video capture (JPEG @ 1fps).
Infrastructure is fully automated with Terraform deploying to Google Cloud Run, Firestore, and Cloud Storage.
Challenges we ran into
- Live API tool-calling bug — The
-12-2025model variant triggers WebSocket 1008 errors when tools are invoked. We built an automatic model probing system that tests candidates at startup and falls back to working variants. - Barge-in responsiveness — Achieving natural interruption required a dual approach: server-side detection via Gemini Live API plus client-side RMS-based VAD to immediately clear audio buffers.
- Language quality — The native audio model produces significantly better responses in English. Rather than forcing Japanese, we adopted an English-first approach with a dedicated translation service for bilingual display.
- Background analysis coordination — Running a separate perception model alongside the live conversation required careful state management to inject analysis context naturally without disrupting the flow.
Accomplishments that we're proud of
- Truly real-time multimodal interaction — audio, video, and text stream bidirectionally with minimal latency
- Native barge-in powered by ADK + Live API — no custom interruption logic needed
- A perception layer that continuously understands whiteboard structure and auto-generates visual annotations
- Graceful degradation — the app works without any cloud services beyond a Gemini API key
What we learned
LiveRequestQueueis the key abstraction for multiplexing audio/video to Gemini while receiving responses simultaneously- ADK dramatically simplifies agent development — session management, tool execution, and Live API integration come built-in
- Multi-model architectures outperform single-model approaches when tasks have different latency/capability requirements
- English-first with translation produces better quality than forcing non-English in system prompts
What's next
- Multi-language support beyond English/Japanese
- Collaborative multi-user review sessions
- Architecture Decision Record (ADR) generation from review notes
- Historical snapshot comparison across sessions
Built With
- audioworklet-api
- fastapi
- gemini-2.0-flash
- gemini-3.1-flash-lite
- gemini-live-api
- google-adk
- google-cloud
- google-cloud-firestore
- google-cloud-run
- next.js
- python
- react
- tailwind-css
- terraform
- typescript
- websocket

Log in or sign up for Devpost to join the conversation.