🗣️📚 Study Buddy — Voice-to-Voice Learning, Anywhere
🌟 Inspiration
Reading is powerful, but it’s lonely. I kept catching myself saying “I wish I could just ask this book a question.” Existing AI readers still force you to type or break focus.
Study Buddy was born from that frustration: a hands-free companion that lets you talk to any document and have it talk back—perfect for commuters, multitaskers, neuro-divergent learners, and anyone who learns best by conversation.
🛠️ How We Built It
- Frontend Framework – React + Vite for instant reloads.
- Document Pipeline
• PDF/text upload → embedded in a customEnhancedDocumentReader
• Full text, viewport text, and highlighted text streamed into context. - Voice Engine
• eleven labsAPI for real-time transcription and TTS. - AI Core
• Google Gemini 2.5 Flash for speed.
• Prompt layering to combine full document, current screen, selection, and chat history—kept under 1 M token window. - UX Polish
•SPACEBARhotkey for instant voice input.
• Translucent right-sidebar chat (EnhancedAIPanel) plus top-centerVoiceStatusOverlaywith clear Listening / Thinking / Speaking states. - Rapid Iteration – Every major feature began with a single Bolt prompt, then hand-tuned with TypeScript hooks and Tailwind CSS utility classes.
🧠 What We Learned
- Prompt engineering for multi-source context without overwhelming the model.
- Designing minimalist, “eyes-busy” interfaces where voice is primary.
- Hotkey ergonomics: mapping actions so they never conflict with native browser shortcuts.
- Balancing speed vs. accuracy by switching between streaming and single-shot AI calls depending on query length.
⚔️ Challenges We Faced
- Token Limits – Chunking large textbooks while still giving the AI holistic awareness.
- Voice Overlap – Early builds triggered multiple speech instances; solved with a global audio mutex.
- Latency Perception – Even 2 s feels long when you’re speaking; added micro-animations and auditory cues to keep users engaged.
- Accessibility – Ensuring keyboard-free workflows still comply with ARIA standards.
🏆 Accomplishments We’re Proud Of
- A true voice-only loop: speak → transcript appears → AI thinks → AI speaks back—no clicks required.
- Contextual precision: answers change when you scroll or highlight, mimicking an attentive human tutor.
- Finished MVP in under a week thanks to Bolt-driven scaffolding.
🚀 What’s Next
- Multi-language conversations (LLM + i18n TTS).
- Voice commands like “Read this section” or “Flip to chapter 3.”
- Collaborative mode so study groups can co-annotate and talk to the same document in real time.
Study Buddy turns every page into a dialogue—because learning should sound as natural as curiosity itself.
Built With
- elevenlabs
- gemini
- javascript
- netlify
- react
- tailwind
- typescript
- vite
Log in or sign up for Devpost to join the conversation.