Inspiration
We noticed a frustrating paradox: people complain about not having time to learn, yet the average American spends 54 minutes daily commuting—time usually wasted on passive scrolling or mindless podcasts. Traditional learning apps require visual attention and manual interaction, making them dangerous while driving and awkward while walking. We asked ourselves: What if your commute could become your classroom? What if you could learn organic chemistry while stuck in traffic, or master machine learning concepts during your morning walk—all hands-free, personalized to YOUR actual course materials? That's how Montessori was born: a voice-first AI academic coach that transforms dead time into productive learning sessions, teaching you directly from your uploaded textbooks and syllabi.
What it does
Montessori is an AI-powered auditory learning platform that lets you:
- Upload your course materials (PDFs, textbooks, lecture notes, syllabi) through a drag-and-drop interface
- Start a voice conversation hands-free—just hit play and start asking questions while commuting
- Get personalized explanations cited directly from YOUR uploaded documents using RAG (Retrieval-Augmented Generation)
- Access current information via real-time web search when topics require up-to-date research
- Set reminders to review challenging concepts using integrated agent tools
- Review past conversations with auto-generated summaries and continue where you left off
How we built it
Montessori is built entirely using Prompt-Driven Development (PDD)—where prompts are the source of truth and code is a regenerable artifact.
Frontend:
- Next.js 14 + TypeScript + TailwindCSS
- React for UI components (voice interface, file uploader, conversation history)
- WebSocket client for real-time bidirectional audio streaming
Backend:
- Next.js API routes for document processing and webhooks
- ElevenLabs Conversational AI as the core platform:
- Knowledge Base API for RAG indexing of uploaded documents
- Conversation History API for persistent session storage
Agent Tools:
- Toolhouse.ai for reminder and calendar integration
- rtrvr.ai for real-time web scraping when users ask about current research
PDD Methodology: When we need to change behavior (e.g., make the agent more Socratic), we modify the prompt and regenerate—no code patching. Tests accumulate in /tests to prevent regressions.
Challenges we ran into
RAG Citation Accuracy Problem: Agent would sometimes cite incorrect document sections or hallucinate sources Solution: Implemented strict prompt engineering requiring exact document+page citations, added metadata tracking to chunks (document name, page number, section), and regenerated the retrieval prompt to prioritize precision over recall
Interrupt Handling Problem: Users couldn't naturally interrupt the agent mid-explanation (crucial for hands-free UX) Solution: ElevenLabs Conversational AI has built-in interrupt detection with 2-3 second pause thresholds—we tuned the sensitivity and added visual feedback (pulse animation stops when user speaks) Result: Natural conversation flow with seamless interruptions
Document Processing Edge Cases Problem: Scanned PDFs, images with text, and malformed files caused upload failures Solution: Added client-side validation, implemented OCR preprocessing for images using Tesseract.js, and graceful error handling with user feedback
Accomplishments that we're proud of
- Built a fully functional voice-first learning platform in 10 hours with sub-300ms latency
- Implemented true Prompt-Driven Development—every major component is regenerable from prompts in /prompts directory
- Achieved accurate RAG citations from user-uploaded documents, solving the "AI tutor that knows YOUR textbooks" problem
- Integrated sponsor technologies (ElevenLabs, Toolhouse, rtrvr, PDD) seamlessly into a cohesive product
- Designed for hands-free voice interaction makes learning possible while driving or walking
- Created a scalable business model with clear path to revenue (freemium SaaS for students/professionals)
- Demonstrated test accumulation through multiple prompt regenerations, proving PDD's durability
What we learned
- PDD is transformative for AI apps: Treating prompts as source code and regenerating components dramatically reduced debugging time. Instead of hunting through code for bugs, we fixed behavior by clarifying prompts.
- RAG requires careful prompt engineering: Generic "retrieve relevant docs" prompts led to hallucinations. We learned to specify exact citation formats, metadata requirements, and retrieval strategies in prompts.
- Voice UX is fundamentally different: Designing for ears-only taught us to avoid visual metaphors ("as you can see"), use natural pauses (ellipses), and keep responses concise (2-3 sentences unless depth requested).
What's next for Montessori
- Mobile App (React Native): 80% of commuters use phones—we need native iOS/Android apps with background audio
- Spaced Repetition Quizzes: After learning sessions, generate quiz questions from conversation history to reinforce retention
- Multi-language Support: Expand beyond English to Spanish, Mandarin, Hindi for global student market
- Offline Mode: Download voice models and document embeddings for subway commutes without internet
Log in or sign up for Devpost to join the conversation.