Inspiration
Studying from documents is something every student does like PDFs, lecture slides, research papers, documentation, textbooks.
But the experience is usually the same. You sit there reading page after page trying to figure things out on your own. If something doesn’t make sense, you reread the same paragraph again… or open another tab to search for explanations somewhere else.
That breaks the learning flow.
Even modern AI assistants don’t completely solve this problem. Most of them answer from general knowledge, not from the exact material you're studying. The explanations may be helpful, but they often don’t match the terminology, context, or examples in the document.
We created 🐢 Mentori to change that.
Mentori turns static study materials into an interactive learning experience. Instead of reading alone, students upload a document and interact with it conversationally.
Using Retrieval-Augmented Generation (RAG) powered by Google’s Gemini models, Mentori analyzes the document and retrieves the most relevant sections when answering questions. This keeps explanations grounded in the actual material being studied.
And with Gemini Live API, the experience becomes conversational — students can simply talk to the tutor and learn naturally.
The idea is simple: make studying feel less like reading a dense document and more like learning with a patient tutor who already understands the material in front of you.
What it does
🐢 Mentori turns any document into an interactive AI tutor.
Instead of passively reading PDFs or lecture notes, students can upload a document and start learning through conversation.
Mentori currently supports two main learning experiences.
Conversational Learning
After a document is uploaded, Mentori analyzes the content and creates a structured learning session.
Instead of expecting the student to read everything alone, Mentori walks through the material section by section and explains the concepts.
The explanations happen through real-time voice interaction powered by Gemini Live API. Students can interrupt naturally, ask follow-up questions, or ask the tutor to explain something differently.
If something is still unclear, the student can even ask Mentori to explain it in another language, and the tutor will switch languages while still staying grounded in the document.
This makes the experience feel much closer to learning with a real tutor, rather than interacting with a static chatbot.
Interview Mode
Mentori also includes an Interview Mode designed to test how well the student understood the material.
Once a document is processed, Mentori generates a curated set of key questions based on the uploaded document.
The interaction works like a real interview or oral exam.
Mentori asks a question and the student answers using voice or text.
If the answer is incomplete or incorrect, Mentori doesn’t immediately reveal the solution. Instead, it guides the student toward the correct reasoning, similar to how a human interviewer would probe understanding.
At the end of each question, Mentori provides:
- the correct answer
- feedback on the student’s response
- suggestions on what could be improved
This turns reading into active learning and deeper understanding, rather than passive memorization.
How we built it
Mentori is built as a real-time AI learning platform on Google Cloud combining document retrieval, conversational AI, and voice interaction.
The system uses Retrieval-Augmented Generation (RAG) so responses remain grounded in the uploaded document instead of generic AI knowledge.
Frontend
The user interface is built using React and hosted on Firebase Hosting.
Students can:
- upload documents
- ask questions through text or voice
- interact with the tutor conversationally
- participate in interview sessions to test their understanding
The frontend communicates with backend services through REST APIs for document processing and WebSockets for real-time conversational interaction.
Document Ingestion (RAG Pipeline)
When a document is uploaded, it is processed through a backend service running on Cloud Run.
Cloud Run /upload
This service handles the document ingestion pipeline:
- stores the document in Cloud Storage
- stores document metadata in Firestore
- splits the document into smaller semantic chunks
- generates embeddings using Vertex AI Embeddings (text-embedding-004)
- stores embeddings in Vertex AI Vector Search
This creates the knowledge base that Mentori uses to answer questions from the document.
Retrieval and AI Reasoning
When a student asks a question, the system retrieves the most relevant sections from the document.
Cloud Run /answer-question/{session-id}
This service:
- receives the user query
- retrieves relevant document chunks from Vertex AI Vector Search
- combines the query, retrieved context, and conversation history
- sends the grounded prompt to Gemini Flash 2.5
- returns a document-grounded response to the frontend
This ensures Mentori answers questions based on the uploaded document, not general knowledge.
Real-Time Conversational Interaction
Mentori’s conversational experience is powered by Gemini Live API.
Cloud Run /live/ws
This service manages real-time interaction between the user and the tutor.
- the React frontend streams user audio through WebSockets
- Gemini Live API processes conversational input and responses
- the system streams responses back to the frontend in real time
This allows students to interrupt naturally, ask follow-up questions, or request explanations in different languages.
Session Persistence
To avoid repeated document processing, Mentori introduces persistent learning sessions.
Cloud Run /ws/{session_id}
Session metadata is stored in Firestore, allowing users to return later and continue studying without uploading the document again.
This improves performance, reduces repeated processing, and keeps the learning experience smooth.
Infrastructure
Mentori runs entirely on Google Cloud Platform.
Key services include:
- Cloud Run (Python FastAPI) for backend services
- Vertex AI Embeddings for document vectorization
- Vertex AI Vector Search for semantic retrieval
- Gemini Flash 2.5 for reasoning and answer generation
- Gemini Live API for real-time conversational interaction
- Cloud Storage for document storage
- Firestore for session persistence
- Firebase Hosting for the React frontend
Deployment and infrastructure provisioning are managed using:
- GitHub
- GitHub Actions
- Terraform
Challenges we ran into
Reliable document grounding
For Mentori to be useful, responses had to stay grounded in the uploaded document rather than drifting into generic AI answers. Designing the RAG pipeline required careful tuning of document chunking, embedding generation, retrieval quality, and prompting.
Avoiding repeated document processing
Reprocessing documents every time a user returned would waste compute and increase latency. We solved this by introducing persistent learning sessions so processed documents can be reused across sessions.
Real-time conversational interaction
Supporting natural voice interaction while keeping responses grounded in the document required coordinating several services at once. By building a WebSocket-based interaction layer around Gemini Live API, we were able to support low-latency conversations while still retrieving document context in the background.
What we learned
Building Mentori reinforced several lessons about AI-powered learning systems.
Context matters. AI responses become much more useful when grounded in the exact material a user is studying.
Voice interaction improves engagement. Conversational learning feels far more natural than reading or typing questions.
RAG improves reliability. Retrieval-based approaches keep AI answers aligned with source material.
Architecture matters. Combining real-time voice interaction with document retrieval requires careful system design, but the result unlocks much more engaging learning experiences.
Built With
- cloud-storage
- css
- fastapi
- firebase-hosting
- firestore
- github
- github-actions
- google-cloud-run
- google-cloud-speech-to-text
- google-cloud-text-to-speech
- google-gemini-api-(gemini-flash-2.5)
- html
- javascript
- python
- react
- retrieval-augmented-generation-(rag)
- vertex-ai-embeddings-(text-embedding-004)
- vertex-ai-vector-search
- websockets

Log in or sign up for Devpost to join the conversation.