Inspiration
Personal experience inspired Audidive with a cochlear implant and the realisation that traditional auditory rehabilitation is often repetitive, clinical, and disconnected from real life. What if hearing training felt like the real world instead of a lab? That idea led us to create a system that lets users train using the content they already love, transforming everyday media into meaningful rehabilitation.
What it does
- Empowers cochlear implant users to select the audiobooks based on genre, gender, change in emotional context.
- Converts youtube videos into audio
- Identifies difficult words and sound patterns in real time
- Pauses content to present listening challenges (e.g., minimal pairs like “splash” vs “smash”) from AssembyAI transcription,
- Dynamically adjusts background noise and difficulty
- Tracks a user’s unique “hearing profile” to target their specific weaknesses
How we built it
We built Echolearn as a full-stack web app with a React + TypeScript + Tailwind frontend and a FastAPI backend. We are using librosa and custom analysis to classify the words. The backend uses yt-dlp to pull audio from YouTube or uploaded media, AssemblyAI to generate transcripts, and the OpenAI Responses API to turn those transcripts into interactive quiz questions. On the frontend, we synced those generated timestamps with video/audio playback so the media pauses at the right moment and asks the user a question in context. We also added local progress tracking, past attempt history, and a video search/classification feature powered by audio analysis.
Challenges we ran into
One major challenge was timing and synchronisation. It’s not enough to generate good questions; they also have to appear at the exact right point in a YouTube video, uploaded video, or audio file, and each media type behaves differently. Another challenge was handling long-running AI workflows smoothly, since downloading media, transcribing audio, and generating questions all take time. We also likely had to deal with making LLM output reliable and structured, plus keeping video classification fast enough to feel responsive. Lovable was also getting stuck on highly technical sides so we had to use another llm.
Accomplishments that we're proud of
We’re proud that the project goes beyond a simple quiz generator and creates a genuinely interactive listening experience. Users can paste a YouTube link or upload their own media, and the app turns it into an adaptive quiz that pauses automatically during playback. On top of that, we added progress tracking, phonetic error logging, past attempts, and a searchable video discovery experience based on speaker's tone and emotion (from which we determine an overall quality score - the larger the range of volume shifts, the larger the perceived emotion range and hence quality), speech rate, speaker type, and gender.
What we learned
We learned that good prompting alone is not enough; you need strict schemas so the LLM can output , fallbacks, and solid error handling to make the experience reliable. We also learned that latency and timing matter just as much as model quality in an interactive product. Most importantly, we learned how to combine transcription, LLMs, audio processing, and frontend playback control into one product that feels practical and user-focused rather than just experimental.
What's next for Echolearn
Expand into a mobile-first platform Integrate with streaming services and real-world audio sources Collaborate with audiologists and healthcare providers Potential later applications: Apply this concept in other fields such as education. Use a similar model and concept to create gamified learning
Our vision is to make hearing rehabilitation engaging, personalised, and accessible to everyone.
Built With
- librosa
- lovable
- openai
- typescript
Log in or sign up for Devpost to join the conversation.