About the Project — Tutor LM
Inspiration
The inspiration for Tutor LM came from a simple gap I kept noticing while studying: I had high-quality notes and powerful AI tools, but nothing that actually taught me the way a human tutor would.
Tools like document-based AI systems are great at answering questions, but they are passive. They wait to be prompted, don’t check understanding, and don’t adapt explanations in real time. Human tutors, on the other hand, explain out loud, draw diagrams, pause to quiz you, and adjust based on confusion. Tutor LM was built to bridge that gap.
The goal became clear: build a NotebookLM-style system, but embodied as a live tutor that can speak, see, and explain.
What I Learned
Through this project, I learned how to:
- Design multimodal learning experiences that combine voice, vision, and text
- Structure AI systems around pedagogical loops instead of simple Q&A
- Use large language models not just for responses, but for post-session reasoning and synthesis
- Translate unstructured conversations into structured educational artifacts like report cards and study guides
Most importantly, I learned that a learning product feels “complete” not when it has many features, but when each interaction has a clear beginning, middle, and end.
How I Built It
Tutor LM is built around two core phases: live tutoring and post-session intelligence.
During a live session, the tutor uses Gemini Live API to:
- Listens to the student via voice
- Observes the student’s screen or physical notebook via camera
- Explains concepts verbally in real time
- Dynamically updates an AI Smart Board with equations, diagrams, and examples
- Launches micro-quizzes to check understanding during the explanation
After the session ends, Gemini 3 Flash is used as the high-intelligence content engine. It processes the entire session transcript and generates:
- JSON-structured report cards
- Comprehensive study notes
- Personalized learning paths based on detected strengths and gaps
This creates a closed-loop learning system where every conversation compounds into long-term understanding.
Gemini Integration
Tutor LM is built around Gemini 3 as its core intelligence layer, with different Gemini capabilities applied at distinct stages of the learning experience.
During live tutoring sessions, Gemini enables real-time conversational understanding across voice, visual context, and on-screen content. The model interprets spoken questions, references visible study materials such as notes or PDFs, and generates step-by-step explanations that drive both spoken responses and dynamic updates to the AI Smart Board, including equations, diagrams, and worked examples.
After each live session ends, Gemini 3 Flash is used as the high-intelligence post-processing engine. The full session transcript—covering explanations, questions, quizzes, and corrections—is passed to Gemini 3 Flash for deep analysis and synthesis. The model transforms this unstructured interaction into structured educational outputs, including JSON-formatted report cards, comprehensive study notes, and personalized learning paths tailored to the student’s demonstrated understanding.
By separating real-time interaction from post-session reasoning, Tutor LM leverages Gemini 3 for both immediacy and depth. This architecture allows the tutor to feel responsive and conversational during lessons, while still producing accurate, structured, and pedagogically grounded learning artifacts afterward—making Gemini 3 central to both the teaching and memory of the system.
Challenges Faced
One of the biggest challenges was avoiding feature overload. Early versions felt powerful but unfinished because too many systems were exposed to the user at once.
Another challenge was deciding when the AI should speak, quiz, or summarize. A tutor should not be purely reactive, but being too interruptive also breaks flow. Finding the right balance required multiple iterations on timing and session structure.
Finally, translating natural conversations into structured, reliable outputs—like report cards and learning paths—required careful prompt design and validation to ensure consistency and educational usefulness.
Final Thoughts
Tutor LM is not just an AI that answers questions—it is an AI that teaches, adapts, and remembers. By grounding live tutoring in the student’s own materials and closing every session with clear outcomes, Tutor LM turns studying into an active, guided learning experience.
What's next for TutorLM
Integrate with Google applications and Create a daily tutor which sets target each day and complete a large chapters in to daily live classes...
Log in or sign up for Devpost to join the conversation.