CoachBoard AI

CoachBoard System Architecture

Inspiration

Every student deserves a patient tutor, but most can't afford one. We asked: what if AI could sit beside a student at their desk, watch them work through a problem on a whiteboard, and guide them with a voice like a real coach? That question became CoachBoard.

What it does

CoachBoard is a voice-driven AI math tutor built around an interactive whiteboard. Students draw problems, write equations, or upload homework (PDF or photo), then simply talk. The coach listens, sees the board, reads the homework, and responds in natural spoken language, just like a human tutor would.

Speak your question: Web Speech API captures the student's voice continuously
Show your work: the tldraw whiteboard lets students draw, annotate, and sketch freely
Upload homework: PDFs are parsed for embedded text and OCR'd; photos are OCR'd via Tesseract
Get coached, not just answered: DeepSeek's LLM responds with Socratic guidance, spoken aloud via speech synthesis

The AI sees both the visual board (exported as a PNG) and structured text extracted from shapes and homework, so it can follow the student's reasoning even when handwriting or symbols aren't pixel-perfect.

How we built it

Layer	Technology
Framework	Next.js 16 (App Router)
Whiteboard	tldraw v3
AI Coach	DeepSeek `deepseek-chat` with vision
PDF parsing	pdf.js (embedded text) + Tesseract.js (OCR fallback)
Voice I/O	Web Speech API + `speechSynthesis`
Styling	Tailwind CSS 4

The /api/agent/analyze route accepts the board PNG (base64), structured board text, homework text, student message, and conversation history, then returns a JSON response with what to say and optional whiteboard annotations.

Math-friendly speech preprocessing (expandMathSpeech) ensures the coach sounds natural when reading equations aloud.

Challenges we ran into

Vision vs. text tradeoffs: OCR and PDF text extraction are imperfect for math notation, so we built a multi-layer pipeline: embedded PDF text, first-page OCR, then board PNG vision, with the model instructed to cross-check all sources.
Voice UX: Tuning the silence detection (about 2 seconds) and selecting the right speechSynthesis voice required careful iteration to feel natural.
tldraw SSR: The whiteboard must run client-only; getting dynamic imports and the export pipeline working reliably took significant debugging.

What we learned

Building an accessible AI tutor is as much a product design challenge as a technical one. The hardest part wasn't calling an API. It was making the interaction feel safe and encouraging for a student who might be frustrated or stuck. Prompt engineering for a pedagogical voice, one that guides rather than just gives answers, mattered enormously.