Inspiration
Education shouldn't be passive. Traditional learning tools make you read walls of text. We wanted to build something that truly shows you knowledge — speaking it aloud, drawing it live, and testing you — all from a single voice prompt.
What it does
ChalkAI transforms any topic into a full multimodal learning experience in real-time:
- 🎤 Voice Input — speak your topic naturally
- ✍️ Live Streaming — explanation appears word-by-word as Gemini generates it
- 🎨 AI Diagrams — Gemini generates SVG educational diagrams that appear inline between paragraphs
- 🔊 Audio Narration — Web Speech API reads the explanation aloud simultaneously
- 🧠 Smart Quiz — auto-generated multiple choice quiz tests your understanding
How we built it
- Frontend: React + Vite with Web Speech API for voice I/O, deployed on Netlify
- Backend: Python FastAPI with streaming endpoints, deployed on Google Cloud Run
- AI: Google Gemini 2.5 Flash via Google GenAI SDK for text generation, SVG diagram generation, and quiz generation
- Streaming: Server-Sent Events stream explanation tokens in real-time
- Interleaved Output: Gemini generates SVG code inline, rendered directly in the browser between paragraphs
Challenges we ran into
- Getting truly interleaved multimodal output — text streaming while simultaneously triggering diagram generation required careful async architecture
- Gemini API regional quota limits required creative model selection and fallback strategies
- Generating consistent, beautiful SVG diagrams with Gemini required extensive prompt engineering to ensure clean layouts with no overlapping elements
- Implementing a smooth TTS queue system so audio narration doesn't interrupt itself mid-sentence
Accomplishments that we're proud of
- Built a complete See + Hear + Speak educational experience in under 24 hours
- Gemini generates actual SVG vector diagrams — not external images — making it fully self-contained
- The interleaved experience feels genuinely magical: text streams, diagrams fade in between paragraphs, audio narrates simultaneously
- Fully deployed on Google Cloud Run with zero cost using free tier
What we learned
- Gemini 2.5 Flash is remarkably capable at generating structured SVG code when given precise constraints
- Streaming APIs require careful state management on the frontend to handle concurrent text + image + audio outputs
- Prompt engineering for consistent visual output is as important as the underlying model capability
What's next for ChalkAI
- Topic Mind Map — interactive SVG knowledge graph connecting related topics
- Multi-language support — explain topics in any language using Gemini's multilingual capabilities
- Personalized learning paths — adapt explanation depth based on quiz performance
- Google Classroom integration — teachers generate lessons for entire classes instantly
Log in or sign up for Devpost to join the conversation.