As of 2013, there are roughly 38 million Americans who have experienced severe hearing loss. Shockingly, only 7.6 million of them could actually benefit from wearing a hearing aid. We wanted to create a world in which they are capable of seeing what we hear, providing near real-time speech-to-text translation. This is one of the practical uses of AR where it actually augments their world by helping them understand others with such ease.
What it does
The user wears an AR device while interacting with others, which captures their speech and immediately renders their speech as text. This text is rightly placed next to the person who is currently speaking, eliminating the need for sign language. This technology would help the user travel anywhere across the world and interact with anyone without any additional external help.
Additionally, it gives haptic feedback once the sound level crosses beyond a threshold, alerting the user for any oncoming vehicles or other threatening situations.
How we built it
It's powered by Unity and Vuforia.
We have built a specific application for the Vuzix AR device, enabling them to experience the world handsfree!
We have also built an AR application compatible with both iOS and Android devices. This application leverages the depth field of view in advanced cameras, rendering 3D text next to the speaker. This has been achieved through real-time image processing to help locate the anchor points for the text.
Real-Time speech-to-text has been enabled by utilizing IBM Watson services and using the device's microphone to perform spectral analysis on the incoming audio.
Challenges we ran into
Detecting the audio levels and providing speech-to-text translation simultaneously without creating additional latency was one of the major issues that we had to overcome.
Finding the anchor point for the text without any local SDKs available was a challenge.
From a consumer perspective to enable accessibility, we built two separate versions of the application, catering to both wearable and handheld devices.
Accomplishments that we're proud of
Two of the team members have not worked with Unity before and learning something new while developing an application this cool was an amazing experience.
This application does not use many third-party APIs which contribute to latency issues.
What's next for Voice-over
Using directional microphones to indicate the directions on an incoming vehicle to create more situational awareness.
Identifying multiple speakers in the same frame and attaching their captions with the respective speaker.
Identify different sources of sound by using Machine Learning to help detect major threats.