Inspiration
Crowded spaces aren’t hard because of background noise — they’re hard because of other people talking. We wanted to build something that feels like “focus mode” for hearing: an intelligent system that helps you stay locked into the conversation that matters, especially in cafés, parties, transit, or group settings.
What it does
ClearTalk is a real-time conversation focuser. It listens to a noisy room, detects who you’re most likely speaking with, and boosts that voice while lowering other speakers. The result is clearer, less mentally exhausting conversations — without muting the world entirely.
How we built it
We built ClearTalk as a web application with a live audio pipeline.
- The browser captures microphone audio in short chunks.
- The backend performs speaker segmentation and speech analysis.
- ElevenLabs transcribes speech to help detect conversational turns.
- Gemini analyzes speaker timing and transcript context to determine the most likely conversational partner.
- We apply dynamic audio ducking to amplify the target speaker and attenuate others.
- Backboard maintains session state to keep focus stable across chunks.
The UI provides a live A/B comparison between raw and focused audio, along with speaker activity visualization.
Challenges we ran into
- Real-time audio processing with low latency.
- Reliable speaker separation within a short build window.
- Preventing rapid focus switching between speakers.
- Handling API failures gracefully while keeping the demo stable.
- Ensuring the “before vs after” difference was dramatic enough for a live demo.
Accomplishments that we're proud of
- Successfully building a live conversation-focusing demo in just a few hours.
- Integrating multiple AI services into one coherent pipeline.
- Creating a clear, audible contrast between chaotic and focused audio.
- Designing a UX that makes complex audio processing feel intuitive.
What we learned
- Speaker diarization and turn-taking detection are harder than they seem.
- Clear demo storytelling is just as important as technical depth.
- AI reasoning combined with structured metadata creates powerful decision systems.
- Stability and fallback logic are critical in live demos.
What's next for ClearTalk
We want to:
- Reduce latency further for seamless real-time use.
- Improve speaker tracking and personalization.
- Explore integration into headphones or hearing devices.
- Expand accessibility use cases for people with hearing challenges or auditory processing disorders.
- Add mobile deployment and real-world testing in busy environments.
Log in or sign up for Devpost to join the conversation.