Inspiration
Every student deserves a patient tutor, but most can't afford one. We asked: what if AI could sit beside a student at their desk, watch them work through a problem on a whiteboard, and guide them with a voice like a real coach? That question became CoachBoard.
What it does
CoachBoard is a voice-driven AI math tutor built around an interactive whiteboard. Students draw problems, write equations, or upload homework (PDF or photo), then simply talk. The coach listens, sees the board, reads the homework, and responds in natural spoken language, just like a human tutor would.
- Speak your question: Web Speech API captures the student's voice continuously
- Show your work: the tldraw whiteboard lets students draw, annotate, and sketch freely
- Upload homework: PDFs are parsed for embedded text and OCR'd; photos are OCR'd via Tesseract
- Get coached, not just answered: DeepSeek's LLM responds with Socratic guidance, spoken aloud via speech synthesis
The AI sees both the visual board (exported as a PNG) and structured text extracted from shapes and homework, so it can follow the student's reasoning even when handwriting or symbols aren't pixel-perfect.
How we built it
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| Whiteboard | tldraw v3 |
| AI Coach | DeepSeek deepseek-chat with vision |
| PDF parsing | pdf.js (embedded text) + Tesseract.js (OCR fallback) |
| Voice I/O | Web Speech API + speechSynthesis |
| Styling | Tailwind CSS 4 |
The /api/agent/analyze route accepts the board PNG (base64), structured board text, homework text, student message, and conversation history, then returns a JSON response with what to say and optional whiteboard annotations.
Math-friendly speech preprocessing (expandMathSpeech) ensures the coach sounds natural when reading equations aloud.
Challenges we ran into
- Vision vs. text tradeoffs: OCR and PDF text extraction are imperfect for math notation, so we built a multi-layer pipeline: embedded PDF text, first-page OCR, then board PNG vision, with the model instructed to cross-check all sources.
- Voice UX: Tuning the silence detection (about 2 seconds) and selecting the right
speechSynthesisvoice required careful iteration to feel natural. - tldraw SSR: The whiteboard must run client-only; getting dynamic imports and the export pipeline working reliably took significant debugging.
What we learned
Building an accessible AI tutor is as much a product design challenge as a technical one. The hardest part wasn't calling an API. It was making the interaction feel safe and encouraging for a student who might be frustrated or stuck. Prompt engineering for a pedagogical voice, one that guides rather than just gives answers, mattered enormously.
What's next
- Multi-page homework OCR
- Handwriting recognition improvements
- Student session history and progress tracking
- Teacher dashboard to review flagged misconceptions
Built With
- deepseek
- next.js
- pdf.js
- react
- tailwind-css
- tesseract.js
- tldraw
- typescript
- webspeechapi
- zod
Log in or sign up for Devpost to join the conversation.