🗣️📚 Study Buddy — Voice-to-Voice Learning, Anywhere

🌟 Inspiration

Reading is powerful, but it’s lonely. I kept catching myself saying “I wish I could just ask this book a question.” Existing AI readers still force you to type or break focus.
Study Buddy was born from that frustration: a hands-free companion that lets you talk to any document and have it talk back—perfect for commuters, multitaskers, neuro-divergent learners, and anyone who learns best by conversation.


🛠️ How We Built It

  1. Frontend Framework – React + Vite for instant reloads.
  2. Document Pipeline
    • PDF/text upload → embedded in a custom EnhancedDocumentReader
    • Full text, viewport text, and highlighted text streamed into context.
  3. Voice Engine
    eleven labsAPI for real-time transcription and TTS.
  4. AI Core
    • Google Gemini 2.5 Flash for speed.
    • Prompt layering to combine full document, current screen, selection, and chat history—kept under 1 M token window.
  5. UX Polish
    SPACEBAR hotkey for instant voice input.
    • Translucent right-sidebar chat (EnhancedAIPanel) plus top-center VoiceStatusOverlay with clear Listening / Thinking / Speaking states.
  6. Rapid Iteration – Every major feature began with a single Bolt prompt, then hand-tuned with TypeScript hooks and Tailwind CSS utility classes.

🧠 What We Learned

  • Prompt engineering for multi-source context without overwhelming the model.
  • Designing minimalist, “eyes-busy” interfaces where voice is primary.
  • Hotkey ergonomics: mapping actions so they never conflict with native browser shortcuts.
  • Balancing speed vs. accuracy by switching between streaming and single-shot AI calls depending on query length.

⚔️ Challenges We Faced

  1. Token Limits – Chunking large textbooks while still giving the AI holistic awareness.
  2. Voice Overlap – Early builds triggered multiple speech instances; solved with a global audio mutex.
  3. Latency Perception – Even 2 s feels long when you’re speaking; added micro-animations and auditory cues to keep users engaged.
  4. Accessibility – Ensuring keyboard-free workflows still comply with ARIA standards.

🏆 Accomplishments We’re Proud Of

  • A true voice-only loop: speak → transcript appears → AI thinks → AI speaks back—no clicks required.
  • Contextual precision: answers change when you scroll or highlight, mimicking an attentive human tutor.
  • Finished MVP in under a week thanks to Bolt-driven scaffolding.

🚀 What’s Next

  • Multi-language conversations (LLM + i18n TTS).
  • Voice commands like “Read this section” or “Flip to chapter 3.”
  • Collaborative mode so study groups can co-annotate and talk to the same document in real time.

Study Buddy turns every page into a dialogue—because learning should sound as natural as curiosity itself.

Built With

Share this project:

Updates