Inspiration

I have often seen complaints online from people with hearing impairment about movie theaters not having closed captioning. I thought it would be amazing for those people to have personal closed captioning that still allowed them to engage in the same way as if closed captioning was available. This was then expanded: can we create engaging closed captioning for interacting with other people?

What it does

Our AR app imposes text generated from a continuous audio stream onto the camera feed from the phone. Essentially, the user can hold up the phone to someone who is talking with them, and simultaneously read what they are saying, read their lips, and still make eye contact and engage with them. The alternative to this would be looking at a speech to text app or trying to understand what is being said without the text, but our app allows the user to combine the two.

How I built it

We used Unity and ARCode to create the AR platform and put the text onto the screen. The speech to text element was integrated into the Unity script using C# and the IBM Watson Speech to Text API.

Challenges I ran into

Having never used any of the design and programming platforms that we used to make this, setup of the project was the most intensive part. Also, the Speech to Text API is not frequently used in Unity, so finding resources for utilizing it for AR was difficult.

Accomplishments and What I learned

Starting off with introductory python and java skills, we learned how to use both Unity and the IBM Speech to Text API, which is exciting because familiarizing ourselves with these new tools was definitely a gateway to new projects in the future.

What's next for AR Speech to Text to Assist with Hearing Impairment

We want to add several more features to this once we acquire more advanced Unity skills. For example, we would like to add face recognition so that the text that appears in the app can be anchored to the person who is speaking. Additionally, we would want to add a speech to sign language feature if there was an API to change text into sign language images.

Built With

  • arcore
  • c#
  • speech-to-text
  • unity
  • visual-studio-code
Share this project:

Updates