Inspiration

Crowded spaces aren’t hard because of background noise — they’re hard because of other people talking. We wanted to build something that feels like “focus mode” for hearing: an intelligent system that helps you stay locked into the conversation that matters, especially in cafés, parties, transit, or group settings.


What it does

ClearTalk is a real-time conversation focuser. It listens to a noisy room, detects who you’re most likely speaking with, and boosts that voice while lowering other speakers. The result is clearer, less mentally exhausting conversations — without muting the world entirely.


How we built it

We built ClearTalk as a web application with a live audio pipeline.

  • The browser captures microphone audio in short chunks.
  • The backend performs speaker segmentation and speech analysis.
  • ElevenLabs transcribes speech to help detect conversational turns.
  • Gemini analyzes speaker timing and transcript context to determine the most likely conversational partner.
  • We apply dynamic audio ducking to amplify the target speaker and attenuate others.
  • Backboard maintains session state to keep focus stable across chunks.

The UI provides a live A/B comparison between raw and focused audio, along with speaker activity visualization.


Challenges we ran into

  • Real-time audio processing with low latency.
  • Reliable speaker separation within a short build window.
  • Preventing rapid focus switching between speakers.
  • Handling API failures gracefully while keeping the demo stable.
  • Ensuring the “before vs after” difference was dramatic enough for a live demo.

Accomplishments that we're proud of

  • Successfully building a live conversation-focusing demo in just a few hours.
  • Integrating multiple AI services into one coherent pipeline.
  • Creating a clear, audible contrast between chaotic and focused audio.
  • Designing a UX that makes complex audio processing feel intuitive.

What we learned

  • Speaker diarization and turn-taking detection are harder than they seem.
  • Clear demo storytelling is just as important as technical depth.
  • AI reasoning combined with structured metadata creates powerful decision systems.
  • Stability and fallback logic are critical in live demos.

What's next for ClearTalk

We want to:

  • Reduce latency further for seamless real-time use.
  • Improve speaker tracking and personalization.
  • Explore integration into headphones or hearing devices.
  • Expand accessibility use cases for people with hearing challenges or auditory processing disorders.
  • Add mobile deployment and real-world testing in busy environments.

Built With

Share this project:

Updates