whiteboard-architect

Whiteboard architecture sessions are where critical design decisions happen — yet they rarely get expert review in real time. Flaws in security, scalability, or reliability often surface weeks later when fixing them is 10x more expensive. We asked: what if an AI senior architect could join every whiteboard session, watching and listening, just like a real colleague?

What it does

Whiteboard Architect is a real-time AI architecture reviewer. Point your camera at a whiteboard, talk through your design, and Archie — an AI senior cloud architect — provides instant voice feedback.

Sees your diagrams through the camera and understands them as you draw
Listens to your explanations via microphone
Responds with real-time voice feedback on security, scalability, reliability, cost, and operations
Interrupts naturally — barge-in support lets you cut in mid-sentence, just like a real conversation
Annotates your whiteboard with visual markers highlighting issues and components
Generates diagrams — converts hand-drawn sketches into clean SVG diagrams
Records findings as structured review notes with severity levels and recommendations

How we built it

We built a multi-model architecture using four Gemini models, each optimized for its task:

Model	Role
Gemini 2.5 Flash Native Audio	Live bidirectional voice + vision conversation
Gemini 3.1 Flash Lite	Background whiteboard analysis & English→Japanese translation
Gemini 2.0 Flash	SVG diagram generation from sketches

The backend (FastAPI + Google ADK) runs four parallel async tasks per WebSocket connection:

Upstream — routes client audio/video to Gemini Live API via LiveRequestQueue
Downstream — streams Gemini responses back to the client
Recovery — monitors session health and detects false barge-ins
Perception — runs periodic deep analysis of the whiteboard

The frontend (Next.js 16) uses custom React hooks for AudioWorklet-based capture (PCM16 16kHz), gapless audio playback (PCM 24kHz), and canvas-based video capture (JPEG @ 1fps).

Infrastructure is fully automated with Terraform deploying to Google Cloud Run, Firestore, and Cloud Storage.

Challenges we ran into

Live API tool-calling bug — The -12-2025 model variant triggers WebSocket 1008 errors when tools are invoked. We built an automatic model probing system that tests candidates at startup and falls back to working variants.
Barge-in responsiveness — Achieving natural interruption required a dual approach: server-side detection via Gemini Live API plus client-side RMS-based VAD to immediately clear audio buffers.
Language quality — The native audio model produces significantly better responses in English. Rather than forcing Japanese, we adopted an English-first approach with a dedicated translation service for bilingual display.
Background analysis coordination — Running a separate perception model alongside the live conversation required careful state management to inject analysis context naturally without disrupting the flow.

Accomplishments that we're proud of

Truly real-time multimodal interaction — audio, video, and text stream bidirectionally with minimal latency
Native barge-in powered by ADK + Live API — no custom interruption logic needed
A perception layer that continuously understands whiteboard structure and auto-generates visual annotations
Graceful degradation — the app works without any cloud services beyond a Gemini API key

What we learned

LiveRequestQueue is the key abstraction for multiplexing audio/video to Gemini while receiving responses simultaneously
ADK dramatically simplifies agent development — session management, tool execution, and Live API integration come built-in
Multi-model architectures outperform single-model approaches when tasks have different latency/capability requirements
English-first with translation produces better quality than forcing non-English in system prompts

What's next

Multi-language support beyond English/Japanese
Collaborative multi-user review sessions
Architecture Decision Record (ADR) generation from review notes
Historical snapshot comparison across sessions

Built With

audioworklet-api
fastapi
gemini-2.0-flash
gemini-3.1-flash-lite
gemini-live-api
google-adk
google-cloud
google-cloud-firestore
google-cloud-run
next.js
python
react
tailwind-css
terraform
typescript
websocket

Updates

俊浩李 started this project — Mar 16, 2026 05:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.