Montessori

Inspiration

We noticed a frustrating paradox: people complain about not having time to learn, yet the average American spends 54 minutes daily commuting—time usually wasted on passive scrolling or mindless podcasts. Traditional learning apps require visual attention and manual interaction, making them dangerous while driving and awkward while walking. We asked ourselves: What if your commute could become your classroom? What if you could learn organic chemistry while stuck in traffic, or master machine learning concepts during your morning walk—all hands-free, personalized to YOUR actual course materials? That's how Montessori was born: a voice-first AI academic coach that transforms dead time into productive learning sessions, teaching you directly from your uploaded textbooks and syllabi.

What it does

Montessori is an AI-powered auditory learning platform that lets you:

Upload your course materials (PDFs, textbooks, lecture notes, syllabi) through a drag-and-drop interface
Start a voice conversation hands-free—just hit play and start asking questions while commuting
Get personalized explanations cited directly from YOUR uploaded documents using RAG (Retrieval-Augmented Generation)
Access current information via real-time web search when topics require up-to-date research
Set reminders to review challenging concepts using integrated agent tools
Review past conversations with auto-generated summaries and continue where you left off

How we built it

Montessori is built entirely using Prompt-Driven Development (PDD)—where prompts are the source of truth and code is a regenerable artifact.

Frontend:

Next.js 14 + TypeScript + TailwindCSS
React for UI components (voice interface, file uploader, conversation history)
WebSocket client for real-time bidirectional audio streaming

Backend:

Next.js API routes for document processing and webhooks
ElevenLabs Conversational AI as the core platform:
Knowledge Base API for RAG indexing of uploaded documents
Conversation History API for persistent session storage

Agent Tools:

Toolhouse.ai for reminder and calendar integration
rtrvr.ai for real-time web scraping when users ask about current research

PDD Methodology: When we need to change behavior (e.g., make the agent more Socratic), we modify the prompt and regenerate—no code patching. Tests accumulate in /tests to prevent regressions.

Challenges we ran into

RAG Citation Accuracy Problem: Agent would sometimes cite incorrect document sections or hallucinate sources Solution: Implemented strict prompt engineering requiring exact document+page citations, added metadata tracking to chunks (document name, page number, section), and regenerated the retrieval prompt to prioritize precision over recall
Interrupt Handling Problem: Users couldn't naturally interrupt the agent mid-explanation (crucial for hands-free UX) Solution: ElevenLabs Conversational AI has built-in interrupt detection with 2-3 second pause thresholds—we tuned the sensitivity and added visual feedback (pulse animation stops when user speaks) Result: Natural conversation flow with seamless interruptions
Document Processing Edge Cases Problem: Scanned PDFs, images with text, and malformed files caused upload failures Solution: Added client-side validation, implemented OCR preprocessing for images using Tesseract.js, and graceful error handling with user feedback

Accomplishments that we're proud of

Built a fully functional voice-first learning platform in 10 hours with sub-300ms latency
Implemented true Prompt-Driven Development—every major component is regenerable from prompts in /prompts directory
Achieved accurate RAG citations from user-uploaded documents, solving the "AI tutor that knows YOUR textbooks" problem
Integrated sponsor technologies (ElevenLabs, Toolhouse, rtrvr, PDD) seamlessly into a cohesive product
Designed for hands-free voice interaction makes learning possible while driving or walking
Created a scalable business model with clear path to revenue (freemium SaaS for students/professionals)
Demonstrated test accumulation through multiple prompt regenerations, proving PDD's durability

What we learned

PDD is transformative for AI apps: Treating prompts as source code and regenerating components dramatically reduced debugging time. Instead of hunting through code for bugs, we fixed behavior by clarifying prompts.
RAG requires careful prompt engineering: Generic "retrieve relevant docs" prompts led to hallucinations. We learned to specify exact citation formats, metadata requirements, and retrieval strategies in prompts.
Voice UX is fundamentally different: Designing for ears-only taught us to avoid visual metaphors ("as you can see"), use natural pauses (ellipses), and keep responses concise (2-3 sentences unless depth requested).

What's next for Montessori

Mobile App (React Native): 80% of commuters use phones—we need native iOS/Android apps with background audio
Spaced Repetition Quizzes: After learning sessions, generate quiz questions from conversation history to reinforce retention
Multi-language Support: Expand beyond English to Spanish, Mandarin, Hindi for global student market
Offline Mode: Download voice models and document embeddings for subway commutes without internet

Built With

Updates

Max Yan started this project — Jan 31, 2026 10:42 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.