Inspiration

Education shouldn't be passive. Traditional learning tools make you read walls of text. We wanted to build something that truly shows you knowledge — speaking it aloud, drawing it live, and testing you — all from a single voice prompt.

What it does

ChalkAI transforms any topic into a full multimodal learning experience in real-time:

  • 🎤 Voice Input — speak your topic naturally
  • ✍️ Live Streaming — explanation appears word-by-word as Gemini generates it
  • 🎨 AI Diagrams — Gemini generates SVG educational diagrams that appear inline between paragraphs
  • 🔊 Audio Narration — Web Speech API reads the explanation aloud simultaneously
  • 🧠 Smart Quiz — auto-generated multiple choice quiz tests your understanding

How we built it

  • Frontend: React + Vite with Web Speech API for voice I/O, deployed on Netlify
  • Backend: Python FastAPI with streaming endpoints, deployed on Google Cloud Run
  • AI: Google Gemini 2.5 Flash via Google GenAI SDK for text generation, SVG diagram generation, and quiz generation
  • Streaming: Server-Sent Events stream explanation tokens in real-time
  • Interleaved Output: Gemini generates SVG code inline, rendered directly in the browser between paragraphs

Challenges we ran into

  • Getting truly interleaved multimodal output — text streaming while simultaneously triggering diagram generation required careful async architecture
  • Gemini API regional quota limits required creative model selection and fallback strategies
  • Generating consistent, beautiful SVG diagrams with Gemini required extensive prompt engineering to ensure clean layouts with no overlapping elements
  • Implementing a smooth TTS queue system so audio narration doesn't interrupt itself mid-sentence

Accomplishments that we're proud of

  • Built a complete See + Hear + Speak educational experience in under 24 hours
  • Gemini generates actual SVG vector diagrams — not external images — making it fully self-contained
  • The interleaved experience feels genuinely magical: text streams, diagrams fade in between paragraphs, audio narrates simultaneously
  • Fully deployed on Google Cloud Run with zero cost using free tier

What we learned

  • Gemini 2.5 Flash is remarkably capable at generating structured SVG code when given precise constraints
  • Streaming APIs require careful state management on the frontend to handle concurrent text + image + audio outputs
  • Prompt engineering for consistent visual output is as important as the underlying model capability

What's next for ChalkAI

  • Topic Mind Map — interactive SVG knowledge graph connecting related topics
  • Multi-language support — explain topics in any language using Gemini's multilingual capabilities
  • Personalized learning paths — adapt explanation depth based on quiz performance
  • Google Classroom integration — teachers generate lessons for entire classes instantly

Built With

  • fastapi
  • google-cloud-run
  • google-gemini-2.5-flash
  • google-genai-sdk
  • python
  • react
  • vite
  • web-speech-api
Share this project:

Updates