BoardyBoo

GIF
System Architecture

Inspiration

When I was a child, my mom sat with me every evening to help me with my homework. But as I grew older, she had to focus on my younger brothers — and private tutoring was simply too expensive. So like many students, I had to figure things out alone.Nowaydays students are searching this need using AI But here's the thing studies show that 88% of students use AI for homework, yet score 17% worse on tests. They're getting answers from static, text-based chatbots. Not actually learning. Just copy-pasting.

What it does

BoardyBoo is a true Live Agent that breaks the "text box" paradigm. It behaves exactly like a real teacher standing at a whiteboard with you:

🗣️ Speaks Naturally: Bidirectional voice streaming via Gemini 2.5 Flash Native Audio allows students to interrupt the tutor mid-sentence to ask questions.
👁️ Sees Your Work: Using camera input, the tutor "sees" physical homework on your desk or tracks the Excalidraw canvas in real-time.
🎨 Draws & Illustrates: The tutor draws animated diagrams, equations, mind maps, and math plots on the whiteboard while it speaks to you. It also generates custom educational illustrations on-the-fly using Gemini 3 Pro.
🔍 Grounded Responses: Before teaching a concept, the tutor uses Google Search to verify facts, entirely mitigating hallucinations.
🧠 Context-Aware Memory: BoardyBoo remembers past sessions. Using Firestore, the Tutor Agent retrieves the student's historical mastery levels at the start of the session, making the "Live" experience deeply personalized rather than disjointed and turn-based.
📅 Manages Your Learning: A 4-agent hierarchy automatically tracks mastery, builds personalized weekly study plans, schedules sessions via Google Calendar API, and emails progress reports.

Visually Immersive Learning

Live Mathematical Plotting: Calculates continuous bounds and handles discontinuities dynamically via tool calls.

Dynamic Flowcharts & Visual Learning: Simplifies complex concepts by drawing mind maps and protocol step-by-steps on the canvas.

Distinct, Customizable Personas: BoardyBoo isn't a generic assistant. Students choose distinct personas—from a strict professor to a friendly mentor—each with unique vocabulary, teaching styles, and Gemini 2.5 voices that best suit their learning needs.

Agentic Scheduling via Google Calendar: The Calendar Agent connects directly to the student's Google Calendar to find slots and book study sessions.

How we built it

We architected a strictly typed, fully async, and low-latency system tailored for a multimodal experience using Google Cloud and Gemini, ensuring robust stability throughout:

Graceful Error Recovery: We implemented structured error payloads and ADK error classification. This allows the root Tutor Agent to gracefully apologize and retry tool calls (like calendar scheduling) if a transient API issue occurs, rather than unexpectedly crashing the voice stream.

System Architecture:

AI-Assisted Vibe Coding: The rapid development of this complex multimodal app was made possible by Google Antigravity acting as our agentic pair-programmer, generating core backend logic, while we used Google's Stitch MCP server to rapidly iterate and scaffold the premium React frontend design.
Backend (Cloud Run + FastAPI): A lightning-fast WebSocket server handles bidirectional PCM audio framing and JSON payloads.
Frontend (Cloud Run + Next.js): Built with Next.js 15, React 19, Excalidraw, and Framer Motion. We wrote custom AudioWorklet processors with ring buffers to achieve sub-20ms mic capture and playback latency.
Google Cloud Services (GCP): Deployed effortlessly using custom deploy.sh infrastructure-as-code scripts straight to Google Cloud Run. We used Firestore for persistent student study plans/mastery, Cloud Storage for canvas snapshots, and Firebase Auth for user sessions.

Agent Architecture (Google ADK)

We built a 4-agent hierarchy where the root Tutor Agent handles voice + canvas + grounding via Gemini 2.5 Flash Native Audio passing context to specialized sub-agents.

Agent	Purpose	Key Tools
🎨 Tutor Agent (root)	Voice conversation, live whiteboard drawing, teaching, image generation	7 canvas tools, plot, image gen, Google Search grounding
📋 Planner Agent	Creates personalised weekly study plans	study plan CRUD, progress data
📅 Calendar Agent	Schedules sessions on Google Calendar	5 calendar tools (CRUD + availability)
📊 Progress Agent	Quizzes, mastery tracking, email reports	quiz results, mastery updates, Gmail

The Tutor Agent is the root of the tree and delegates to the 3 specialised sub-agents via ADK's transfer_to_agent. All agents use Gemini 2.5 Flash Native Audio for natural voice with StreamingMode.BIDI via Google ADK's runner.run_live().

Core Tools Used by the Agents

Tool Category	Description	Tools
🎨 Canvas Mechanics	Core tools for manipulating the Excalidraw whiteboard	`draw_line`, `draw_rectangle`, `draw_circle`, `add_text`, `remove_element`, `clear_canvas`, `highlight_area`
🖼️ Media & Vision	Tools for generating graphics and seeing user input	`generate_image (Gemini 3 Pro)` , `capture_canvas_snapshot`, `analyze_camera_feed`
📈 Math & Logic	Specialised teaching tools	`plot_function`, `google_search (Grounding)`
📅 Scheduling	Google Calendar integration via Calendar MCP	`list_events`, `create_event`, `update_event`, `delete_event`, `check_availability`
📊 Progress (Firestore)	Student data persistence	`update_mastery`, `save_quiz_results`, `create_study_plan`, `get_student_history`

Challenges we ran into

Gemini Live API Payload Limits: Large serialized Excalidraw array payloads consistently caused 1007/1008 WebSocket crash codes. We solved this with a "Canvas Bridge Pattern"—storing the massive UI state server-side and only passing a small UUID back to the LLM, maintaining full visual richness without crashing the stream.
The "AI Pause": We noticed a delay between when the AI decided to draw and when the drawing actually appeared. We implemented an Early Canvas Push—pre-executing drawing tools on functionCall events before the ADK roundtrip completed. This made the drawing experience feel instantaneous.
Audio Streaming Latency: Synchronizing complex canvas commands with uninterrupted bidirectional audio required significant effort, leading us to build a custom browser AudioWorklet.

Accomplishments that we're proud of

We are incredibly proud of our Progressive Animation System. Instead of canvas shapes instantly popping onto the screen, BoardyBoo groups elements into staggered slices. Lines interpolate and grow gradually while the voice track plays, perfectly mirroring how a real teacher writes on a chalk board.

We're also proud that we strictly followed the challenge criteria: deploying a robust, automated stack to GCP, correctly leveraging Native Audio/ADK, and making an agent that truly isn't just a text box.

What we learned

We learned that ADK's transfer_to_agent is magical. It allowed us to keep the low-latency audio stream continuous while shifting complex logic behind the scenes from the Tutor to the Planner or Calendar agent. We also learned how vital it is to push visual side-effects early; executing UI updates before the LLM confirms the completion roundtrip drastically improves the perceived latency of multimodal agents.

What's next for BoardyBoo.

We want to integrate more specialized agents into the hierarchy—like a Physics Agent capable of running actual 2D simulations on the whiteboard, or a Chemistry Agent that builds 3D molecules. We also plan to build multiplayer study rooms where multiple students can talk to each other and the AI simultaneously on the same canvas.