Mentori

Mentori Architecture: RAG-powered document tutoring with Gemini Flash and Gemini Live API for conversational learning in real time

Inspiration

Studying from documents is something every student does like PDFs, lecture slides, research papers, documentation, textbooks.

But the experience is usually the same. You sit there reading page after page trying to figure things out on your own. If something doesn’t make sense, you reread the same paragraph again… or open another tab to search for explanations somewhere else.

That breaks the learning flow.

Even modern AI assistants don’t completely solve this problem. Most of them answer from general knowledge, not from the exact material you're studying. The explanations may be helpful, but they often don’t match the terminology, context, or examples in the document.

We created 🐢 Mentori to change that.

Mentori turns static study materials into an interactive learning experience. Instead of reading alone, students upload a document and interact with it conversationally.

Using Retrieval-Augmented Generation (RAG) powered by Google’s Gemini models, Mentori analyzes the document and retrieves the most relevant sections when answering questions. This keeps explanations grounded in the actual material being studied.

And with Gemini Live API, the experience becomes conversational — students can simply talk to the tutor and learn naturally.

The idea is simple: make studying feel less like reading a dense document and more like learning with a patient tutor who already understands the material in front of you.

What it does

🐢 Mentori turns any document into an interactive AI tutor.

Instead of passively reading PDFs or lecture notes, students can upload a document and start learning through conversation.

Mentori currently supports two main learning experiences.

Conversational Learning

After a document is uploaded, Mentori analyzes the content and creates a structured learning session.

Instead of expecting the student to read everything alone, Mentori walks through the material section by section and explains the concepts.

The explanations happen through real-time voice interaction powered by Gemini Live API. Students can interrupt naturally, ask follow-up questions, or ask the tutor to explain something differently.

If something is still unclear, the student can even ask Mentori to explain it in another language, and the tutor will switch languages while still staying grounded in the document.

This makes the experience feel much closer to learning with a real tutor, rather than interacting with a static chatbot.

Interview Mode

Mentori also includes an Interview Mode designed to test how well the student understood the material.

Once a document is processed, Mentori generates a curated set of key questions based on the uploaded document.

The interaction works like a real interview or oral exam.

Mentori asks a question and the student answers using voice or text.

If the answer is incomplete or incorrect, Mentori doesn’t immediately reveal the solution. Instead, it guides the student toward the correct reasoning, similar to how a human interviewer would probe understanding.

At the end of each question, Mentori provides:

the correct answer
feedback on the student’s response
suggestions on what could be improved

This turns reading into active learning and deeper understanding, rather than passive memorization.

How we built it

Mentori is built as a real-time AI learning platform on Google Cloud combining document retrieval, conversational AI, and voice interaction.

The system uses Retrieval-Augmented Generation (RAG) so responses remain grounded in the uploaded document instead of generic AI knowledge.

Frontend

The user interface is built using React and hosted on Firebase Hosting.

Students can:

upload documents
ask questions through text or voice
interact with the tutor conversationally
participate in interview sessions to test their understanding

The frontend communicates with backend services through REST APIs for document processing and WebSockets for real-time conversational interaction.

Document Ingestion (RAG Pipeline)

When a document is uploaded, it is processed through a backend service running on Cloud Run.

Cloud Run /upload

This service handles the document ingestion pipeline:

stores the document in Cloud Storage
stores document metadata in Firestore
splits the document into smaller semantic chunks
generates embeddings using Vertex AI Embeddings (text-embedding-004)
stores embeddings in Vertex AI Vector Search

This creates the knowledge base that Mentori uses to answer questions from the document.

Retrieval and AI Reasoning

When a student asks a question, the system retrieves the most relevant sections from the document.

Cloud Run /answer-question/{session-id}

This service:

receives the user query
retrieves relevant document chunks from Vertex AI Vector Search
combines the query, retrieved context, and conversation history
sends the grounded prompt to Gemini Flash 2.5
returns a document-grounded response to the frontend

This ensures Mentori answers questions based on the uploaded document, not general knowledge.

Real-Time Conversational Interaction

Mentori’s conversational experience is powered by Gemini Live API.

Cloud Run /live/ws

This service manages real-time interaction between the user and the tutor.

the React frontend streams user audio through WebSockets
Gemini Live API processes conversational input and responses
the system streams responses back to the frontend in real time

This allows students to interrupt naturally, ask follow-up questions, or request explanations in different languages.

Session Persistence

To avoid repeated document processing, Mentori introduces persistent learning sessions.

Cloud Run /ws/{session_id}

Session metadata is stored in Firestore, allowing users to return later and continue studying without uploading the document again.

This improves performance, reduces repeated processing, and keeps the learning experience smooth.

Infrastructure

Mentori runs entirely on Google Cloud Platform.

Key services include:

Cloud Run (Python FastAPI) for backend services
Vertex AI Embeddings for document vectorization
Vertex AI Vector Search for semantic retrieval
Gemini Flash 2.5 for reasoning and answer generation
Gemini Live API for real-time conversational interaction
Cloud Storage for document storage
Firestore for session persistence
Firebase Hosting for the React frontend

Deployment and infrastructure provisioning are managed using:

GitHub
GitHub Actions
Terraform

Challenges we ran into

Reliable document grounding

For Mentori to be useful, responses had to stay grounded in the uploaded document rather than drifting into generic AI answers. Designing the RAG pipeline required careful tuning of document chunking, embedding generation, retrieval quality, and prompting.

Avoiding repeated document processing

Reprocessing documents every time a user returned would waste compute and increase latency. We solved this by introducing persistent learning sessions so processed documents can be reused across sessions.

Real-time conversational interaction

Supporting natural voice interaction while keeping responses grounded in the document required coordinating several services at once. By building a WebSocket-based interaction layer around Gemini Live API, we were able to support low-latency conversations while still retrieving document context in the background.

What we learned

Building Mentori reinforced several lessons about AI-powered learning systems.

Context matters. AI responses become much more useful when grounded in the exact material a user is studying.

Voice interaction improves engagement. Conversational learning feels far more natural than reading or typing questions.

RAG improves reliability. Retrieval-based approaches keep AI answers aligned with source material.

Architecture matters. Combining real-time voice interaction with document retrieval requires careful system design, but the result unlocks much more engaging learning experiences.

Built With

cloud-storage
css
fastapi
firebase-hosting
firestore
github
github-actions
google-cloud-run
google-cloud-speech-to-text
google-cloud-text-to-speech
google-gemini-api-(gemini-flash-2.5)
html
javascript
python
react
retrieval-augmented-generation-(rag)
vertex-ai-embeddings-(text-embedding-004)
vertex-ai-vector-search
websockets

Updates

Harshini Hegde started this project — Mar 16, 2026 07:37 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.