Inspiration
We’ve all realized that the best way to learn is to explain a concept out loud (a principle known as the Feynman Technique). However, explaining things to yourself is passive, and professors don't have the bandwidth to orally examine 500 students one-on-one. We wanted to bridge this gap by building a tool that scales the "strict but fair" oral exam experience to everyone.
What it does
Studeo is an AI-powered oral exam proctor. It transforms static study materials into active, high-pressure interviews.
Ingestion: The user uploads course material (PDFs), and Studeo instantly "reads" it.
The Exam: It spawns a specialized AI Agent that conducts a real-time voice conversation. It doesn't just chat; it quizzes the user, detects gaps in knowledge, interrupts when necessary, and corrects misconceptions based strictly on the source text.
How we built it
We architected a real-time multimodal pipeline to ensure low latency:
Frontend: Built with Next.js (App Router) and Tailwind CSS for a clean, split-screen "Exam Room" UI.
Real-Time Comms: We utilized LiveKit to handle the WebRTC audio streaming and state synchronization (visualizer/transcript).
The Brain: We used Google Gemini as our reasoning engine, leveraging its massive context window to hold entire textbooks in memory for precise fact-checking.
Processing: We integrated The Token Company's bear-1 model for efficient information compression, making the most of Gemini's context capabilities and saving on LLM costs.
Challenges we ran into
Orchestrating a real-time conversation loop is difficult. Our biggest challenge was state synchronization, i.e., ensuring the visualizer (listening/thinking/speaking), the live transcript, and the audio stream remained perfectly aligned without race conditions, all while minimizing latency. We also had to solve the "Zombie Room" issue, where disconnected users would leave ghost agents running on the server.
Accomplishments that we're proud of
We are incredibly proud of the sub-second latency we achieved; the conversation feels natural and "face-to-face." We also successfully engineered a dynamic Agent CRUD system, allowing users to spin up infinite unique tutors (e.g., "History 101," "Ethics in Computing") and switch between them instantly without restarting the server.
What we learned
We gained deep experience in WebRTC event handling via LiveKit, specifically how to manage data channels for non-media payloads (like transcripts). We also learned how to chain multiple distinct AI models (Gemini + bear-1 + TREA) into a cohesive, user-friendly product.
What's next for Studeo
Studeo will evolve to become fully multimodal. We plan to support ingestion of lecture videos, websites, and slide decks alongside PDFs. We also aim to implement "Group Study Mode," where multiple students can join a single room and be quizzed by the agent simultaneously, using LiveKit's Speaker Diarization capabilities.
Built With
- elevenlabs
- livekit
- python
- thetokencompany
- trae
Log in or sign up for Devpost to join the conversation.