Inspiration

Every study tool we've ever used is fundamentally passive. You read a summary. You watch a video. You stare at a PDF. The real learning happens when a teacher stands at a whiteboard, writes the equation in front of you, asks "why does this term go to zero?", and waits for you to answer.

We wanted to build that — not simulate it with flashcards, but actually replicate the real-time feedback loop of a great teacher. One that listens to you, challenges you, writes math live on the board, and never loses its patience.

What It Does

Whiteboard is a voice-native AI tutoring system. You speak. The tutor speaks back. Equations appear on the board in real time as the tutor explains them, rendered in LaTeX. When the tutor is mid-sentence and you cut in — it stops and listens.

Core loop:

  • Upload your lecture PDFs or past exam papers to a Space (a subject container)
  • Start a session in Socratic mode (tutor questions you) or Exam Crammer mode (rapid-fire problem solving)
  • Speak freely — always-on mic with VAD (Voice Activity Detection) handles silence detection, pre-roll buffering, and interruption
  • The tutor responds with speech + a live whiteboard that animates LaTeX in real time
  • Session ends with a structured summary: topics covered, MCQ score, weak areas flagged

What makes it different:

  • You can interrupt mid-explanation — the tutor pauses instantly
  • The whiteboard renders math character-by-character as the tutor speaks
  • Weak areas are tracked across sessions and surfaced back in future ones
  • Two distinct pedagogical modes with different prompt strategies

How We Built It

Frontend — React + TypeScript + Tailwind CSS + Framer Motion
Single-page app with animated screen transitions, a live KaTeX whiteboard renderer, and a fully custom voice pipeline in the browser.

Voice pipeline — Gemini Live API (WebSocket)
Real-time bidirectional audio via @google/genai. The browser streams microphone PCM to Gemini Live, receives audio chunks back, and plays them through the Web Audio API. A custom VAD layer (RMS + silence threshold) handles end-of-utterance detection with pre-roll buffering to avoid clipping the first syllable.

Backend — FastAPI + SQLite
Manages spaces, documents, sessions, turns, and summaries. PDF text is extracted server-side and injected as RAG context into each tutor turn. Session state, weak areas, and MCQ scores are persisted across runs.

AI Tutoring + Reasoning — DigitalOcean Gradient Serverless Inference
Each tutor turn is generated via DO's OpenAI-compatible inference endpoint. The system prompt encodes the pedagogical mode (Socratic vs. Crammer), injects retrieved document context, and structures the response into speech, board, and topic fields.

Knowledge Base — DigitalOcean Knowledge Base
Uploaded PDFs are optionally synced to a DO Knowledge Base via presigned upload + data source indexing. At query time, the KB is hit first; local PDF extraction serves as the fallback.

Agent routing — DigitalOcean ADK
Student turns can optionally be routed through a DO ADK agent endpoint. trace_id headers are captured and stored with each session turn for observability.

Challenges We Ran Into

  • VAD timing — balancing pre-roll buffer, silence threshold, and minimum utterance length so it feels natural without being jumpy or missing short answers.
  • LaTeX streaming — animating LaTeX character-by-character required custom chunking logic since KaTeX needs valid syntax at each render step.
  • Perspective grid sync — getting the horizontal grid lines to land exactly on the vertical lines required deriving both from the same vanishing point rather than computing them independently.
  • Echo suppression — the tutor's own speech was being picked up by the mic, transcribed, and fed back as a user message. Solved by muting mic capture while the tutor is speaking and applying a cooldown after audio ends.

Accomplishments That We're Proud Of

  • A fully working real-time voice conversation loop with sub-second perceived latency
  • True interruption behavior — student can cut off the tutor mid-sentence
  • A math whiteboard that renders live, not after the fact
  • Two genuinely distinct tutoring modes with different prompt architectures
  • Clean session analytics: topic tracking, MCQ scoring, weak area detection across sessions
  • Full DigitalOcean Gradient integration: inference + KB retrieval + KB autosync + ADK routing + trace ID passthrough

What We Learned

  • Gemini Live's WebSocket protocol is powerful but requires careful state management around session lifecycle, token refresh, and audio chunk sequencing
  • Voice-first UX is hard to get right — most of the real engineering work was in the mic pipeline, not the AI layer
  • Socratic tutoring requires fundamentally different prompt design than answer-generation — the model needs to resist giving answers and redirect instead

What's Next for Whiteboard

  • Multi-subject session routing with automatic space detection from student speech
  • Adaptive difficulty — track per-concept mastery and adjust question difficulty dynamically
  • Collaborative mode — two students, one tutor, shared whiteboard
  • Mobile app with offline mode for commute studying
  • Analytics dashboard: mastery curves, time-per-topic heatmaps, predicted exam readiness score

Built With

Share this project:

Updates