ClearTalk

Inspiration

Crowded spaces aren’t hard because of background noise — they’re hard because of other people talking. We wanted to build something that feels like “focus mode” for hearing: an intelligent system that helps you stay locked into the conversation that matters, especially in cafés, parties, transit, or group settings.

What it does

ClearTalk is a real-time conversation focuser. It listens to a noisy room, detects who you’re most likely speaking with, and boosts that voice while lowering other speakers. The result is clearer, less mentally exhausting conversations — without muting the world entirely.

How we built it

We built ClearTalk as a web application with a live audio pipeline.

The browser captures microphone audio in short chunks.
The backend performs speaker segmentation and speech analysis.
ElevenLabs transcribes speech to help detect conversational turns.
Gemini analyzes speaker timing and transcript context to determine the most likely conversational partner.
We apply dynamic audio ducking to amplify the target speaker and attenuate others.
Backboard maintains session state to keep focus stable across chunks.

The UI provides a live A/B comparison between raw and focused audio, along with speaker activity visualization.

Challenges we ran into

Real-time audio processing with low latency.
Reliable speaker separation within a short build window.
Preventing rapid focus switching between speakers.
Handling API failures gracefully while keeping the demo stable.
Ensuring the “before vs after” difference was dramatic enough for a live demo.

Accomplishments that we're proud of

Successfully building a live conversation-focusing demo in just a few hours.
Integrating multiple AI services into one coherent pipeline.
Creating a clear, audible contrast between chaotic and focused audio.
Designing a UX that makes complex audio processing feel intuitive.

What we learned

Speaker diarization and turn-taking detection are harder than they seem.
Clear demo storytelling is just as important as technical depth.
AI reasoning combined with structured metadata creates powerful decision systems.
Stability and fallback logic are critical in live demos.

What's next for ClearTalk

We want to:

Reduce latency further for seamless real-time use.
Improve speaker tracking and personalization.
Explore integration into headphones or hearing devices.
Expand accessibility use cases for people with hearing challenges or auditory processing disorders.
Add mobile deployment and real-world testing in busy environments.

Built With

backboard
elevenlabs
gemini
python

Updates

Ishani Munasinghe started this project — Feb 28, 2026 04:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.