🎓 mento.ai - Your Personal AI Tutor That Actually Teaches


Problem Statement

Students worldwide face a learning crisis hiding in plain sight. Millions turn to YouTube tutorials, note-sharing apps, and AI chatbots for help — yet confusion persists.

  • Tutors are expensive and inaccessible for most students
  • AI chatbots return walls of text without real explanation or patience
  • Crowded classrooms leave individual doubts unaddressed
  • Existing EdTech apps like DoubtNut are paywalled and impersonal

The result? Students memorise without understanding. Confidence erodes. Learning gaps compound over time, not because students lack effort, but because they lack presence.


Solution Overview

mento.ai is an emotion-aware, 3D AI tutor that doesn't just answer questions, it teaches.

Using a lifelike 3D avatar powered by conversational video intelligence, mento.ai joins students in a real-time session that feels like a Google Meet call with a personal mentor. It sees you, listens to you, reads your emotional state, and adapts its explanation style to match your pace and your confusion, not a generic script.

Whether you're stuck on chemical reactions at midnight or need a concept broken down five different ways, mento.ai is always available, always patient, and always personal.


Key Features

  • 🎥 Conversational Video Interface (CVI) — The AI tutor joins like a video call, creating human presence absent in text-based tools
  • 😊 Emotion-Aware Responses — Real-time facial and voice analysis detects confusion or confidence and adjusts explanation depth accordingly
  • 🧑‍🏫 Lifelike 3D Avatar — An expressive, animated tutor that responds naturally, making learning feel engaging rather than transactional
  • 🧩 Adaptive Step-by-Step Teaching — Breaks down complex topics progressively, asking guiding questions rather than dumping answers
  • 📚 Subject Library — Students can initiate learning sessions across any subject on demand
  • 📊 Learning Dashboard — Tracks time spent, sessions completed, and subject-wise progress
  • Always Available — No scheduling, no waitlists, no paywalls. 24/7 personalised academic support

Technologies Used

Layer Stack
Frontend React.js, Tailwind CSS
3D Avatar Three.js / Ready Player Me
Conversational AI GPT-4o
Emotion Detection Azure Face API
Text-to-Speech ElevenLabs
Speech-to-Text Deepgram
Video Intelligence Tavus (CVI)
Backend Node.js, Express
Database MongoDB

Target Users

  • 🎒 High school and college students stuck on specific concepts
  • 🌍 Self-learners lacking access to quality tutors due to cost or geography
  • 🏘️ Students in underserved regions where quality education infrastructure is limited
  • Anyone who has ever felt too embarrassed to ask the same question twice

Inspiration

The inspiration came from a real frustration, sitting in a classroom, too hesitant to raise your hand for the third time, going home and watching YouTube videos that don't quite answer your specific doubt.

We asked: what if every student had access to a tutor who never got impatient, always explained things clearly, and could actually see when you were confused?

That question became mento.ai.


What It Does

mento.ai provides real-time, personalised tutoring through an emotion-aware 3D AI avatar. Students start a session, ask doubts naturally, just like talking to a human tutor, and receive step-by-step adaptive explanations. The system detects emotional cues to gauge understanding and adjusts its teaching in real time. A subject library and learning dashboard give students structure and measurable progress.


How We Built It

We integrated a multi-modal AI stack:

  • GPT-4o powers the reasoning and teaching logic
  • ElevenLabs + Deepgram handle voice I/O
  • Azure Face API captures real-time emotional signals
  • Tavus brings the conversational video interface to life
  • Three.js renders the expressive 3D avatar
  • React + Node.js form the frontend-backend backbone

The biggest architectural challenge was synchronising emotion signals, speech, and avatar animation in real time with minimal latency.


Challenges We Ran Into

  • Latency in multi-modal pipelines — Synchronising facial emotion data, speech recognition, LLM inference, and avatar animation required careful async orchestration
  • 🧠 Making AI feel human — Generating responses that teach rather than just answer required deep prompt engineering with pedagogical frameworks baked in
  • 💡 Emotion model accuracy — Facial expression detection across varied lighting conditions needed calibration and fallback logic

Accomplishments That We're Proud Of

  • ✅ Built a fully functional prototype with a working CVI session end-to-end
  • ✅ Created an AI tutor that genuinely adapts to student confusion — probing and guiding, not just answering
  • ✅ Designed a UI that feels welcoming and lowers the intimidation barrier for students
  • ✅ Successfully integrated 5+ real-time APIs into a coherent educational experience

What We Learned

  • Multi-modal AI systems require extremely careful state management across audio, video, and language streams
  • Pedagogy matters as much as technology — how the AI asks questions is as important as how it answers them
  • Real-world EdTech impact comes from removing friction, not just adding features

What's Next for mento.ai

  • 🌐 Multilingual support — Expanding to regional languages to reach non-English-speaking students
  • 📖 Curriculum integration — Aligning sessions with school/college syllabi for structured learning paths
  • 👥 Peer learning mode — Collaborative sessions where students learn together with AI facilitation
  • 📱 Mobile app — Offline-first for students with limited connectivity
  • 🤝 Institutional partnerships — Piloting with schools and NGOs in underserved regions

Built With

Share this project:

Updates