-
-
The teacher you wished you had at 2am.
-
Your learning hub — start a new session or continue where you left off.
-
Upload your document and start learning instantly.
-
Live AI lecture in progress — interrupt anytime, ask anything.
-
workflow - The journey from uploading document to getting a teacher
-
Architecture Diagram
💡 Inspiration
As students and lifelong learners, we've all faced the same problem: a mountain of documents to read. Whether it's a 40-slide lecture deck, a dense research paper, or a company onboarding manual, the process of reading alone is inherently passive and isolating.
You can't ask questions when you're confused.
You can't get a concept explained in a different way.
And you have no way of knowing if you actually understood the material.
The inspiration for Leksa came from that exact frustration.
What if every document could have its own teacher?
When we discovered the Gemini Live API with its real-time voice interaction and barge-in interruption capabilities, we realized we could build something powerful: an AI that doesn't just read documents — it teaches them.
Leksa transforms a lonely reading session into an interactive classroom experience.
🤖 What It Does
Leksa is your personal AI voice teacher that transforms documents into immersive real-time lectures.
Instead of silently reading a document, you can listen, interact, and ask questions, just like in a real classroom.
How it works
Upload Your Document
Upload any PDF or PowerPoint presentation.
Listen to Your Lecture
An AI teacher begins explaining the content out loud in a natural conversational voice.
Interrupt Anytime
If you're confused, simply speak. The AI pauses instantly, listens to your question, and answers before continuing.
Adaptive Learning
If a concept isn't clear, Leksa re-explains it in another way.
Comprehension Checks
After important segments, Leksa asks questions to ensure you understood the material.
Leksa turns passive reading into active learning.
🏗️ How We Built It
Leksa is a full-stack, cloud-native AI application designed for real-time voice interaction.
Our architecture combines Gemini AI models with Google Cloud infrastructure.
Backend — FastAPI on Google Cloud Run
We built the backend using FastAPI because it provides:
- High performance
- Native WebSocket support
- Efficient real-time communication
The backend is:
- Containerized using Docker
- Deployed on Google Cloud Run
- Automatically scalable for multiple users
Document Processing & Lecture Generation
When a user uploads a document:
The file is stored in Google Cloud Storage
The backend extracts text using:
PyMuPDF for PDFs
python-pptx for PowerPoint files
The extracted text is sent to gemini-2.0-flash
This model acts as our Lecture Planner, converting the document into:
- Structured lecture segments
- Natural explanations
- Comprehension questions
- The generated lecture structure is stored in Firestore.
Real-Time Voice Interaction
This is the core innovation of Leksa, powered by gemini-2.0-flash-live-001.
- The React frontend captures microphone input using the Web Audio API
- Audio is streamed to the backend via WebSockets
- The backend forwards the stream to the Gemini Live API
- Gemini processes speech and streams AI-generated voice responses back
Because Gemini Live supports barge-in detection, users can interrupt the AI mid-sentence.
This creates a natural conversational learning experience.
Frontend — React + Web Audio
The user interface is built using React.
It handles:
- Document uploads
- WebSocket connections
- Real-time audio streaming
- Voice playback
The frontend uses WebRTC / Web Audio API for microphone input and WebSockets for real-time communication.
⚡ Challenges We Ran Into
Bidirectional Audio Streaming
Managing stable two-way audio streaming over WebSockets was our biggest challenge. We had to carefully handle audio chunking, buffering, and synchronization to prevent lag.
Document Parsing
PDFs and PowerPoints often contain messy structures. We built a pre-processing pipeline to clean and structure the text before sending it to Gemini.
State Management for Interruptions
When users interrupt the AI, the system must pause, answer, and resume exactly where it stopped. We used Firestore session state tracking to manage this.
Cloud Deployment & Cold Starts
Cloud Run initially introduced cold start latency. We solved this by enabling minimum instances and optimizing our Docker image.
🏆 Accomplishments We're Proud Of
True Real-Time Barge-in
Users can interrupt the AI mid-sentence, creating a natural conversation instead of a static chatbot interaction.
Scalable Cloud Architecture
Leksa is a fully deployed Google Cloud application, capable of handling multiple users.
Adaptive AI Teaching
Leksa doesn't just read text aloud — it explains concepts, rephrases ideas, and quizzes users.
Voice-First Learning
The entire interaction is voice-driven, creating a hands-free learning experience.
Multi-Language Support
Leksa will support multiple languages, making it accessible worldwide.
📚 What We Learned
The Power of Gemini Live
The Gemini Live API enables extremely natural voice agents with latency often <500ms.
Voice UX Design
Voice applications require clear listening indicators and feedback signals to build user trust.
Full-Stack Cloud Architecture
We learned how to integrate Cloud Run, Cloud Storage, Firestore, and Cloud Build into a production-ready pipeline.
State Management
Tracking lecture progress and conversation context with Firestore was essential for smooth real-time interaction.
🚀 What's Next for Leksa
Visual Understanding
Future versions will use Gemini multimodal capabilities to explain charts, diagrams, and images.
Learning Analytics
Users will get dashboards showing comprehension progress and weak areas.
Collaborative Learning
Multiple users will be able to join the same lecture session and learn together.
🌟 Our Vision
Our goal is simple:
Make every piece of written knowledge accessible through conversation.
Because every document deserves to be heard.
Built With
- fastapi
- gemini-2.0-flash
- gemini-2.0-flash-live-001
- gemini-2.5-flash-native-audio-preview-12-2025
- google-genai
- pymupdf
- python
- python-dotenv
- python-pptx
- react
- uvicorn
- webrtc
- websockets

Log in or sign up for Devpost to join the conversation.