EDUVA AI Private Tutor

EDUVA dashboard where students start AI tutoring sessions using voice, homework upload, whiteboard collaboration, or screen sharing.
Multiple study modes including collaborative whiteboard solving, homework analysis, and real-time screen sharing with the AI tutor.
Students customize their AI tutor with different voice personas, languages, and teaching styles for a personalized learning experience.
The AI tutor analyzes uploaded homework and PDFs, highlights answers, and explains solutions step-by-step.
Students can share their screen so the AI tutor can analyze problems and guide them through solutions in real time.
The AI tutor evaluates answers and provides instant feedback, encouragement, and guidance during the learning session.
AI and student collaborate on a digital whiteboard where concepts, diagrams, and problem-solving steps are explained visually.
The tutor generates structured notes and follow-up questions automatically during the explanation.
Multimodal architecture showing voice, vision, and Gemini processing to deliver real-time AI tutoring.
EDUVA AI Private Tutor deployed on Google Cloud Run for scalable real-time multimodal learning sessions.
Google Cloud Build pipeline confirming successful deployment of the AI Private Tutor service.

Inspiration

Students often study alone when they get stuck on difficult problems. They search online, read textbooks, or watch videos, but learning is far more effective when a teacher can see the student’s work and guide them step by step.

At the same time, many students around the world do not have access to high-quality tutoring. Parents also struggle to understand where their children are facing difficulties or how to support them.

I wanted to build a tutor that doesn’t just answer questions, but actually collaborates with the student — seeing what they see, hearing their reasoning, and guiding them interactively.

By leveraging Gemini’s multimodal capabilities, I created an AI tutor that behaves more like a real teacher sitting next to the student.

What it does

EDUVA AI Private Tutor is a real-time multimodal learning assistant powered by Gemini.

Students can talk to the tutor using natural voice conversation while sharing their learning materials.
The AI can analyze:

textbooks
handwritten notes
PDFs
whiteboards
screen-shared applications

Instead of simply answering questions, the tutor actively collaborates with the student by:

highlighting mistakes
pointing to important concepts
drawing visual explanations
guiding the student step-by-step through the solution

Key Capabilities

🎤 Real-time voice conversation with the AI tutor
👁 Visual understanding of PDFs, whiteboards, and screen sharing
✏️ Live annotations and highlights directly on the student's workspace
💡 Context-aware suggestions for the next learning step
📓 Persistent notebook storing explanations, formulas, and summaries

This creates a shared learning workspace where the student and AI tutor solve problems together.

How I built it

The AI Private Tutor is powered by Gemini 2.5 Flash using the Gemini Live API to enable real-time multimodal interaction.

The system streams voice and visual context simultaneously to the AI, allowing the tutor to understand both what the student is saying and what they are looking at.

Core Technologies

Gemini 2.5 Flash – multimodal reasoning
Gemini Live API – real-time voice & vision streaming
Google GenAI SDK – model integration
React + TypeScript – interactive frontend
Node.js – session orchestration
WebAudio API – low-latency voice streaming
pdfjs-dist – PDF analysis
KaTeX – mathematical formula rendering
Google Cloud Run – scalable cloud deployment

A custom context capture engine composites the student's visual workspace (PDFs, whiteboards, screen share) into optimized frames that are streamed to Gemini alongside the voice input.

Challenges we ran into

One of the biggest challenges was synchronizing voice explanations with precise visual annotations.

When the tutor says:

“Look at this equation.”

the system must ensure the annotation appears exactly at the correct location in the student's workspace.

This required building:

a coordinate transformation engine
a multimodal synchronization pipeline

Another challenge was enabling natural interruptions.
Students often interrupt teachers mid-explanation, so we implemented real-time barge-in logic allowing students to stop the tutor and ask follow-up questions naturally.

We also optimized real-time streaming to maintain low latency while processing both audio and visual inputs.

Accomplishments that we're proud of

We successfully built a fully interactive multimodal tutoring system that goes far beyond a traditional chatbot.

Highlights include:

⚡ Low-latency real-time voice conversation
👁 Visual understanding of the student workspace
✏️ AI-generated annotations on learning materials
🌍 Tutor personas and cultural adaptation
📓 Automatic notebook creation with formulas and summaries

The result is an AI tutor that can see, hear, and collaborate with the student in real time.

What we learned

Building a multimodal AI tutor taught us that true learning assistance requires more than text interaction.

Voice alone is not enough — visual context is essential.

By combining voice and vision through Gemini, we created a far more natural tutoring experience.

We also learned that real-time AI systems require precise synchronization between:

audio streams
visual context
AI reasoning

What's next for EDUVA AI Private Tutor

Our next goal is to expand EDUVA into a complete AI learning ecosystem.

Future developments include:

🎓 personalized subject-specific AI tutors
📊 deeper learning progress tracking
📈 adaptive learning paths based on student performance
👨‍👩‍👧 parent insight dashboards
🌍 support for more languages and education systems

My vision is a world where every student has access to a personalized AI tutor available anytime they need help.

Built With

css
gemini-2.5-flash
gemini-live-api
google-cloud-run
google-genai-sdk
katex
node.js
pdfjs-dist
react
tailwind
typescript
webaudio-api

Updates

mohamed eisa started this project — Mar 15, 2026 03:41 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.