Project Story What Inspired Us We were frustrated with how study tools feel either too passive (read-only note apps) or too mechanical (flashcard drills). We wanted something that actually teaches you—like having a patient tutor sitting next to you, but one who can see exactly what you're studying.

NotebookLM showed us the dream of AI-grounded study tools, but we wanted to go further: not just summarized audio, but an actual voice tutor you can talk to—one that stays strictly grounded in your own uploaded material.

What We Learned Voice AI is more than just speech-to-text. Getting the ElevenLabs agent to reliably use our notebook content required careful prompt engineering. We learned that dynamic variables (passing {{summary}} directly into the agent's context) works far better than asking the agent to "remember" uploaded files.

Image generation can be educational, not just decorative. The generateStudyImage tool we built lets the voice tutor generate diagrams on-demand—concept maps, labeled illustrations, visual explanations. During a lesson, instead of saying "let me draw this for you," it actually shows you.

Local-first doesn't mean single-user. By adding session-based auth with httpOnly cookies, we built a multi-user system that still runs entirely locally. Perfect for a hackathon demo that doesn't need cloud infrastructure.

How We Built It We built Quantum Teacher as a notebook-first workspace:

Notebooks as study workspaces – Each notebook holds uploaded files (PDF, TXT, MD, DOCX), normalized source text, generated study guides, and lesson history Pseudo-RAG pipeline – Upload → extract → normalize → generate compact study guide → inject into ElevenLabs prompt via dynamic variables Voice-first interaction – ElevenLabs hosted agent with a custom prompt that prioritizes notebook content over default "quantum physics" wording On-demand visuals – NanoGPT image generation via qwen-image-2.0 triggered by the voice agent during lessons Featherless for text models for summarizations and recaps. Lesson memory – One-time AI recap generation per session (what was covered, what the student learned, what they're struggling with) The stack: React frontend, Express backend, SQLite for storage, bcrypt for auth, ElevenLabs for voice, NanoGPT for images.

Challenges We Faced Getting the voice agent to use our content – The ElevenLabs default prompt kept drifting toward quantum physics examples. We solved this by structuring the prompt to explicitly prioritize {{notebook_title}} and {{summary}} dynamic variables. Image generation model naming – The API expected qwen-image-2.0, not qwen-image-2. Small typo, big blocker. Light mode surprises – Dark mode had many hardcoded colors that didn't translate to light mode. We had to systematically replace them with CSS theme variables. Client tools not firing – We learned the hard way that ElevenLabs client tools only work when explicitly configured in the agent dashboard, not just in the prompt.

Built With

Share this project:

Updates

posted an update

For Featherless, we initially attempted to use it, but the free plan did not contain any frontier models (models were capped to 15 billion parameters), and also featherless used a subscription-based credit system rather than a per-token based system, which would prevent this project from being scalable. Due to these problems with featherless, we pivoted to NanoGPT (https://nano-gpt.com/), which was a model router, with pricing per token (which would be cheaper in the long run). It's SDK was the same as the standarized OpenAI SDK, thus making the pivot easy.

Log in or sign up for Devpost to join the conversation.