Inspiration

Students today increasingly turn to AI tools for homework help, but most systems simply provide answers instead of fostering understanding. We were inspired by the experience of a live tutor sitting beside a student, asking questions, drawing explanations on a whiteboard, and guiding the student to the solution rather than solving the problem for them.

Our goal was to recreate that interactive tutoring experience using AI. With the recent advances in multimodal AI and real-time interaction through Gemini Live, we saw an opportunity to build an AI tutor that can see homework, converse naturally with students, and visually explain concepts on a collaborative whiteboard.


What it does

AI Live Tutor is a real-time multimodal AI tutor designed to help students understand their homework rather than simply get answers.

Students can:

  • Show their homework using their device camera
  • Ask questions using natural voice conversation
  • Get guided hints and explanations
  • See the tutor draw steps, highlight concepts, and generate diagrams on a shared whiteboard

The system uses multiple specialized AI agents to:

  • Analyze homework problems using computer vision
  • Guide students using Socratic questioning
  • Visually explain solutions using a collaborative Excalidraw whiteboard
  • Review completed work and provide feedback
  • Generate reinforcement problems to strengthen understanding

The result feels like a real tutor session powered by AI.


How we built it

We designed the system using a multi-agent architecture built on Google’s AI ecosystem.

Core components include

Gemini Live API
Handles real-time conversational interaction, allowing students to speak naturally and interrupt the AI tutor.

Agent Development Kit (ADK)
Orchestrates multiple specialized agents that collaborate during the tutoring session.

AI Agents

  • Vision Agent – interprets homework from camera input
  • Conversation Agent – manages dialogue and interruptions
  • Tutor Agent – guides the student using Socratic reasoning
  • Canvas Agent – converts explanations into whiteboard actions
  • Diagram Agent – generates visual diagrams for concepts
  • Review Agent – evaluates student answers
  • Reinforcement Agent – creates follow-up practice questions

Interactive Whiteboard

We integrated Excalidraw to create a shared reasoning canvas where the AI can:

  • write equations
  • draw diagrams
  • highlight steps
  • visually demonstrate concepts

Cloud Infrastructure

The backend is deployed on Google Cloud Run, with session state stored in Firestore and images handled through Cloud Storage.

The result is a real-time, multimodal tutoring system that combines vision, voice, reasoning, and visual teaching.


Challenges we ran into

One of the biggest challenges was orchestrating multiple AI agents in real time while maintaining a natural conversation flow.

Another challenge was translating conceptual explanations into visual whiteboard instructions. Instead of simply generating text, the AI had to produce structured drawing commands that the Excalidraw canvas could render dynamically.

Handling interruptible voice conversations was also complex. The system needed to manage partial responses, maintain context, and adapt when the student asked follow-up questions mid-explanation.

Finally, balancing AI assistance with real learning was critical. We designed the tutor to use Socratic questioning, ensuring the system encourages reasoning rather than giving away answers.


Accomplishments that we're proud of

We successfully created a system that feels less like a chatbot and more like a live tutoring experience.

Key achievements include:

  • Building a real-time multimodal tutor using Gemini Live
  • Designing a multi-agent architecture that separates vision, tutoring, diagrams, and evaluation
  • Creating a visual reasoning whiteboard where the AI teaches concepts interactively
  • Implementing diagram generation for math and science explanations
  • Demonstrating how AI can teach students to think rather than just solve problems

Seeing the AI guide a student step-by-step while drawing explanations on the canvas was a major milestone.


What we learned

Through this project we learned that effective AI tutoring requires more than powerful models.

Key insights include:

  • Visual explanations dramatically improve learning compared to text-only responses
  • A multi-agent architecture helps separate reasoning, teaching, and visualization responsibilities
  • Real-time AI interaction requires careful management of context and latency
  • The most valuable AI education tools are those that guide thinking rather than provide answers

We also gained hands-on experience building systems with Gemini Live, agent orchestration, and multimodal AI pipelines.


What's next for AI Live Tutor

We see several exciting directions to expand AI Live Tutor.

Personalized Learning

Tracking student progress and adapting lessons based on strengths and weaknesses.

Expanded Diagram Capabilities

Automatically generating graphs, geometry figures, and science visualizations.

Collaborative Learning

Allowing multiple students or teachers to join a session.

Curriculum Integration

Aligning tutoring with school standards and learning objectives.

Mobile App Experience

Making the tutor accessible directly from a student's phone or tablet.

Our long-term vision is to build an AI tutor that feels as natural and effective as learning with a human teacher, helping students everywhere gain deeper understanding and confidence in their studies.

Built With

Share this project:

Updates