Inspiration
260 million children worldwide lack access to quality education. In Bangladesh alone, private tutoring costs $500–$1,500/year — unaffordable. Existing AI tools are text-based chatbots — smart but cold. Students don't just need answers; they need a teacher they can see, hear, and interact with naturally. We asked: What if every child had a personal AI teacher that feels real?
What it does
EduMind is a 3D AI virtual teacher that speaks, emotes, and lip-syncs in real time. Students have natural voice conversations with a lifelike avatar — just like sitting in a real classroom.
Key capabilities:
- Live Voice Conversation — Full-duplex voice via Gemini Live API with real-time lip-synced 3D avatar
- AI Image Generation — Educational diagrams with Bengali text support via Gemini 3 Pro Image, explained by the teacher using Vision API
- Smart Quiz System — Adaptive MCQs with Bayesian Knowledge Tracing (85% mastery prediction)
- Deep Research — Google Search-grounded comprehensive reports on any topic
- Curriculum Mode — RAG-powered lessons from NCTB/CBSE/Cambridge syllabi
- Dual Avatars — Male and female teachers with 8 emotions and 8 hand gestures
All five modes work in both text chat and live voice conversation.
How we built it
Multi-Model Orchestration with Gemini 3:
- Gemini 3 Flash (
gemini-3-flash-preview) — Powers chat, quiz generation, research, and curriculum with 1M token context - Gemini 3 Pro Image (
gemini-3-pro-image-preview) — Generates educational diagrams with accurate Bengali text rendering - Gemini 2.5 Flash Native Audio (
gemini-2.5-flash-native-audio-preview) — Real-time bidirectional voice via Live API with tool calling - Google Cloud TTS — Fallback text-to-speech in 70+ languages
The Core Innovation — Live API Lip Sync:
We built a custom _autoLipsyncFromPCM() engine that analyzes Gemini Live API's PCM16 audio stream in 25ms segments, calculates RMS amplitude, and maps it to viseme mouth shapes (aa/O/E/I) — creating real-time lip sync on a Three.js 3D avatar during live voice conversations. No other platform does this.
Live API Tool Calling: During voice conversations, the AI teacher can execute tools (generate images, create quizzes, run deep research) without interrupting the conversation flow.
Tech Stack: Three.js + TalkingHead (custom fork), Vanilla JS, Firebase Auth/Firestore, Stripe, Vercel Edge Functions.
Challenges we faced
- Lip sync from raw PCM — Gemini Live API returns raw audio with no viseme/phoneme data. We had to build amplitude-to-viseme mapping from scratch using RMS analysis on 25ms audio segments.
- Tool calling during streaming audio — Coordinating image generation and quiz overlays while the avatar is speaking required careful state management.
- Bengali text in AI images — Most image models fail at non-Latin scripts. Gemini 3 Pro Image handles Bengali accurately.
- Audio buffering — Managing AudioWorklet streaming with proper buffering to prevent gaps or overlaps in avatar speech.
Accomplishments we're proud of
- First platform to achieve real-time lip sync with Gemini Live API on a 3D avatar
- 5 learning modes all working in both text and live voice
- < 1200ms latency from student speech to avatar response
- Adaptive learning with 85% mastery prediction accuracy
What we learned
- Gemini Live API's native audio streaming is incredibly powerful for building conversational AI
- Gemini 3 Pro Image's ability to render Bengali text accurately opens education to non-English speakers
- Real-time lip sync from PCM audio is achievable with simple amplitude analysis — no need for complex phoneme detection
What's next for EduMind
- Mobile apps (iOS/Android) — Q1 2026
- On-device AI for 100% offline capability — Q4 2026
- Global expansion to 10+ countries — 2027
- Target: 1 million students by 2027
Built With
- css3
- firebase-authentication
- firestore
- gemini-3-flash-api
- gemini-3-pro-image-api
- gemini-live-api
- google-cloud-tts
- indexeddb
- javascript
- node.js
- stripe
- talkinghead
- three.js
- vercel
- web-speech-api
- websocket



Log in or sign up for Devpost to join the conversation.