Inspiration

We have seen a lot of people who are deaf/hard of hearing feeling left out in group conversations. To address this issue, and help them being involved in group conversations, we built Deafine.

What it does

It is a transcription tool for deaf/hard of hearing users, that captures microphone audio, transcribes it with automatic speaker identification, and displays live captions on the UI.

How we built it

We used ElevenLabs to convert the Speech to Text, prepared the backend such that when there is silence we do not make redundant API calls. We added support for multiple speakers, such that it is able to identify that there are multiple people talking. Post that, If the user's name is taken, we have added a haptic feedback and a quick notification style

Challenges we ran into

The 3 main challenges we ran into were:

  1. Backend design - Converting ElevenLabs UUIDs to simple S1, S2 labels and grouping words by speaker and designing the APIs.
  2. Real-time sync- Coordinating audio callbacks, API timing, and UI updates
  3. Overlap detection - Tracking multiple simultaneous speakers within 2-second windows

Accomplishments that we're proud of

The main accomplishments are that we were able to solve real-world accessibility problem. On the technical side, we can say that we did not train any ML models, which actually helped us build the MVP really fast. Also, the notification for when the user's name is called out.

What we learned

We learned how real-time audio processing with WebSockets works, plus the buffering strategies for batched API calls. Also we had understood optional dependency design with graceful fallbacks due to a cross-platform packaging challenge which came in during the usage of Voice Activity Detection module.

What's next for Deafine

For the short term, the next steps include streaming for lower latency than current 5-second batches which would also help with persistent identifiers of people around and playing around with audio isolation. For the longer term, We would integrate support for blind people to navigate using AI.

Built With

Share this project:

Updates