🗣️📚 Study Buddy — Voice-to-Voice Learning, Anywhere

🌟 Inspiration

Reading is powerful, but it’s lonely. I kept catching myself saying “I wish I could just ask this book a question.” Existing AI readers still force you to type or break focus.
Study Buddy was born from that frustration: a hands-free companion that lets you talk to any document and have it talk back—perfect for commuters, multitaskers, neuro-divergent learners, and anyone who learns best by conversation.

🛠️ How We Built It

Frontend Framework – React + Vite for instant reloads.
Document Pipeline
• PDF/text upload → embedded in a custom EnhancedDocumentReader
• Full text, viewport text, and highlighted text streamed into context.
Voice Engine
• eleven labsAPI for real-time transcription and TTS.
AI Core
• Google Gemini 2.5 Flash for speed.
• Prompt layering to combine full document, current screen, selection, and chat history—kept under 1 M token window.
UX Polish
• SPACEBAR hotkey for instant voice input.
• Translucent right-sidebar chat (EnhancedAIPanel) plus top-center VoiceStatusOverlay with clear Listening / Thinking / Speaking states.
Rapid Iteration – Every major feature began with a single Bolt prompt, then hand-tuned with TypeScript hooks and Tailwind CSS utility classes.

🧠 What We Learned

Prompt engineering for multi-source context without overwhelming the model.
Designing minimalist, “eyes-busy” interfaces where voice is primary.
Hotkey ergonomics: mapping actions so they never conflict with native browser shortcuts.
Balancing speed vs. accuracy by switching between streaming and single-shot AI calls depending on query length.

⚔️ Challenges We Faced

Token Limits – Chunking large textbooks while still giving the AI holistic awareness.
Voice Overlap – Early builds triggered multiple speech instances; solved with a global audio mutex.
Latency Perception – Even 2 s feels long when you’re speaking; added micro-animations and auditory cues to keep users engaged.
Accessibility – Ensuring keyboard-free workflows still comply with ARIA standards.

🏆 Accomplishments We’re Proud Of

A true voice-only loop: speak → transcript appears → AI thinks → AI speaks back—no clicks required.
Contextual precision: answers change when you scroll or highlight, mimicking an attentive human tutor.
Finished MVP in under a week thanks to Bolt-driven scaffolding.

🚀 What’s Next

Multi-language conversations (LLM + i18n TTS).
Voice commands like “Read this section” or “Flip to chapter 3.”
Collaborative mode so study groups can co-annotate and talk to the same document in real time.

Study Buddy turns every page into a dialogue—because learning should sound as natural as curiosity itself.

Built With

elevenlabs
gemini
javascript
netlify
react
tailwind
typescript
vite

Updates

umang Bansal started this project — Jun 30, 2025 12:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.