🎓 mento.ai - Your Personal AI Tutor That Actually Teaches
Problem Statement
Students worldwide face a learning crisis hiding in plain sight. Millions turn to YouTube tutorials, note-sharing apps, and AI chatbots for help — yet confusion persists.
- Tutors are expensive and inaccessible for most students
- AI chatbots return walls of text without real explanation or patience
- Crowded classrooms leave individual doubts unaddressed
- Existing EdTech apps like DoubtNut are paywalled and impersonal
The result? Students memorise without understanding. Confidence erodes. Learning gaps compound over time, not because students lack effort, but because they lack presence.
Solution Overview
mento.ai is an emotion-aware, 3D AI tutor that doesn't just answer questions, it teaches.
Using a lifelike 3D avatar powered by conversational video intelligence, mento.ai joins students in a real-time session that feels like a Google Meet call with a personal mentor. It sees you, listens to you, reads your emotional state, and adapts its explanation style to match your pace and your confusion, not a generic script.
Whether you're stuck on chemical reactions at midnight or need a concept broken down five different ways, mento.ai is always available, always patient, and always personal.
Key Features
- 🎥 Conversational Video Interface (CVI) — The AI tutor joins like a video call, creating human presence absent in text-based tools
- 😊 Emotion-Aware Responses — Real-time facial and voice analysis detects confusion or confidence and adjusts explanation depth accordingly
- 🧑🏫 Lifelike 3D Avatar — An expressive, animated tutor that responds naturally, making learning feel engaging rather than transactional
- 🧩 Adaptive Step-by-Step Teaching — Breaks down complex topics progressively, asking guiding questions rather than dumping answers
- 📚 Subject Library — Students can initiate learning sessions across any subject on demand
- 📊 Learning Dashboard — Tracks time spent, sessions completed, and subject-wise progress
- ⏰ Always Available — No scheduling, no waitlists, no paywalls. 24/7 personalised academic support
Technologies Used
| Layer | Stack |
|---|---|
| Frontend | React.js, Tailwind CSS |
| 3D Avatar | Three.js / Ready Player Me |
| Conversational AI | GPT-4o |
| Emotion Detection | Azure Face API |
| Text-to-Speech | ElevenLabs |
| Speech-to-Text | Deepgram |
| Video Intelligence | Tavus (CVI) |
| Backend | Node.js, Express |
| Database | MongoDB |
Target Users
- 🎒 High school and college students stuck on specific concepts
- 🌍 Self-learners lacking access to quality tutors due to cost or geography
- 🏘️ Students in underserved regions where quality education infrastructure is limited
- Anyone who has ever felt too embarrassed to ask the same question twice
Inspiration
The inspiration came from a real frustration, sitting in a classroom, too hesitant to raise your hand for the third time, going home and watching YouTube videos that don't quite answer your specific doubt.
We asked: what if every student had access to a tutor who never got impatient, always explained things clearly, and could actually see when you were confused?
That question became mento.ai.
What It Does
mento.ai provides real-time, personalised tutoring through an emotion-aware 3D AI avatar. Students start a session, ask doubts naturally, just like talking to a human tutor, and receive step-by-step adaptive explanations. The system detects emotional cues to gauge understanding and adjusts its teaching in real time. A subject library and learning dashboard give students structure and measurable progress.
How We Built It
We integrated a multi-modal AI stack:
- GPT-4o powers the reasoning and teaching logic
- ElevenLabs + Deepgram handle voice I/O
- Azure Face API captures real-time emotional signals
- Tavus brings the conversational video interface to life
- Three.js renders the expressive 3D avatar
- React + Node.js form the frontend-backend backbone
The biggest architectural challenge was synchronising emotion signals, speech, and avatar animation in real time with minimal latency.
Challenges We Ran Into
- ⚡ Latency in multi-modal pipelines — Synchronising facial emotion data, speech recognition, LLM inference, and avatar animation required careful async orchestration
- 🧠 Making AI feel human — Generating responses that teach rather than just answer required deep prompt engineering with pedagogical frameworks baked in
- 💡 Emotion model accuracy — Facial expression detection across varied lighting conditions needed calibration and fallback logic
Accomplishments That We're Proud Of
- ✅ Built a fully functional prototype with a working CVI session end-to-end
- ✅ Created an AI tutor that genuinely adapts to student confusion — probing and guiding, not just answering
- ✅ Designed a UI that feels welcoming and lowers the intimidation barrier for students
- ✅ Successfully integrated 5+ real-time APIs into a coherent educational experience
What We Learned
- Multi-modal AI systems require extremely careful state management across audio, video, and language streams
- Pedagogy matters as much as technology — how the AI asks questions is as important as how it answers them
- Real-world EdTech impact comes from removing friction, not just adding features
What's Next for mento.ai
- 🌐 Multilingual support — Expanding to regional languages to reach non-English-speaking students
- 📖 Curriculum integration — Aligning sessions with school/college syllabi for structured learning paths
- 👥 Peer learning mode — Collaborative sessions where students learn together with AI facilitation
- 📱 Mobile app — Offline-first for students with limited connectivity
- 🤝 Institutional partnerships — Piloting with schools and NGOs in underserved regions
Built With
- api
- azure
- css
- deepgram
- elevenlabs
- express.js
- face
- framer
- microsoft
- mongodb
- motion
- node.js
- openai
- postgresql
- react
- speech
- tailwind
- tavus
- typescript
- vite
- web


Log in or sign up for Devpost to join the conversation.