Inspiration

260 million children worldwide lack access to quality education. In Bangladesh alone, private tutoring costs $500–$1,500/year — unaffordable. Existing AI tools are text-based chatbots — smart but cold. Students don't just need answers; they need a teacher they can see, hear, and interact with naturally. We asked: What if every child had a personal AI teacher that feels real?

What it does

EduMind is a 3D AI virtual teacher that speaks, emotes, and lip-syncs in real time. Students have natural voice conversations with a lifelike avatar — just like sitting in a real classroom.

Key capabilities:

  • Live Voice Conversation — Full-duplex voice via Gemini Live API with real-time lip-synced 3D avatar
  • AI Image Generation — Educational diagrams with Bengali text support via Gemini 3 Pro Image, explained by the teacher using Vision API
  • Smart Quiz System — Adaptive MCQs with Bayesian Knowledge Tracing (85% mastery prediction)
  • Deep Research — Google Search-grounded comprehensive reports on any topic
  • Curriculum Mode — RAG-powered lessons from NCTB/CBSE/Cambridge syllabi
  • Dual Avatars — Male and female teachers with 8 emotions and 8 hand gestures

All five modes work in both text chat and live voice conversation.

How we built it

Multi-Model Orchestration with Gemini 3:

  • Gemini 3 Flash (gemini-3-flash-preview) — Powers chat, quiz generation, research, and curriculum with 1M token context
  • Gemini 3 Pro Image (gemini-3-pro-image-preview) — Generates educational diagrams with accurate Bengali text rendering
  • Gemini 2.5 Flash Native Audio (gemini-2.5-flash-native-audio-preview) — Real-time bidirectional voice via Live API with tool calling
  • Google Cloud TTS — Fallback text-to-speech in 70+ languages

The Core Innovation — Live API Lip Sync: We built a custom _autoLipsyncFromPCM() engine that analyzes Gemini Live API's PCM16 audio stream in 25ms segments, calculates RMS amplitude, and maps it to viseme mouth shapes (aa/O/E/I) — creating real-time lip sync on a Three.js 3D avatar during live voice conversations. No other platform does this.

Live API Tool Calling: During voice conversations, the AI teacher can execute tools (generate images, create quizzes, run deep research) without interrupting the conversation flow.

Tech Stack: Three.js + TalkingHead (custom fork), Vanilla JS, Firebase Auth/Firestore, Stripe, Vercel Edge Functions.

Challenges we faced

  • Lip sync from raw PCM — Gemini Live API returns raw audio with no viseme/phoneme data. We had to build amplitude-to-viseme mapping from scratch using RMS analysis on 25ms audio segments.
  • Tool calling during streaming audio — Coordinating image generation and quiz overlays while the avatar is speaking required careful state management.
  • Bengali text in AI images — Most image models fail at non-Latin scripts. Gemini 3 Pro Image handles Bengali accurately.
  • Audio buffering — Managing AudioWorklet streaming with proper buffering to prevent gaps or overlaps in avatar speech.

Accomplishments we're proud of

  • First platform to achieve real-time lip sync with Gemini Live API on a 3D avatar
  • 5 learning modes all working in both text and live voice
  • < 1200ms latency from student speech to avatar response
  • Adaptive learning with 85% mastery prediction accuracy

What we learned

  • Gemini Live API's native audio streaming is incredibly powerful for building conversational AI
  • Gemini 3 Pro Image's ability to render Bengali text accurately opens education to non-English speakers
  • Real-time lip sync from PCM audio is achievable with simple amplitude analysis — no need for complex phoneme detection

What's next for EduMind

  • Mobile apps (iOS/Android) — Q1 2026
  • On-device AI for 100% offline capability — Q4 2026
  • Global expansion to 10+ countries — 2027
  • Target: 1 million students by 2027

Built With

  • css3
  • firebase-authentication
  • firestore
  • gemini-3-flash-api
  • gemini-3-pro-image-api
  • gemini-live-api
  • google-cloud-tts
  • indexeddb
  • javascript
  • node.js
  • stripe
  • talkinghead
  • three.js
  • vercel
  • web-speech-api
  • websocket
Share this project:

Updates