HEAR - Generating Subtitles for Life
HEAR is an iOS application that adds visual aid during an oral conversation for individuals with hearing impairments. It uses speech-to-text technology integrated with Augmented Reality and facial recognition to add subtitles underneath each speaker throughout a conversation.
A simple scenario showcasing our app in a classroom environment.
A simple scenario showcasing our app in a conversational environment.
Inspired by a recent conference talk on accessibility as well as family members that are hearing impaired, we wanted to create a hack that targeted pain points that individuals with hard of hearing deal with, every day.
The ARKit 2 was used to capture objects in a 3D scene and attach subtitle nodes to them allowing the subtitles to follow speakers. Subtitle text size is dictated based off distance which would not be possible without ARKit.
CoreML2 was mostly used for its computer vision application. HEAR uses facial recognition to detect potential speakers and to position subtitles in the right position. This is achieved efficiently by utilizing the Vision API.
Speech to text is the most important feature of HEAR and for that reason, the SiriKit was chosen to transcribe speech from speakers to subtitles. Having Siri perform some computations and natural language processing locally helps speedup the transcription which leads to a better user experience.
HEAR uses SpriteKit to overlay the subtitles in a 3D environment. SpriteKit also allows text customization to make the text clearer and more legible on varying backgrounds.
As none of the members were familiar with Swift or iOS development, creating the HEAR iOS application and utilizing advanced technologies was both challenging and exciting. While the learning curve was steep, working together through paired programming helped us work through problems efficiently as well as ease the transition of learning a completely new environment. Technically, we ran into trouble using SpriteKit and ARKit, as most tutorials and documentation required the user to touch where they wanted to place an object. We wanted our object placed based off of the decision made from the facial recognition software we created. Calculating the true depth and changing the subtitle attributes accordingly required more math than we were ready for.
The HEAR team has many ambitious plans for the application some of which include:
- Simultaneous speakers and subtitles
- More accurate speaker tracking
- Higher accuracy in noisy environments
- Syncing of conversations to cloud for later review
- Integration into augmented reality lenses
- Real time translation of subtitles
Benjamin Barault, Francesco Valela, Jacob Gagné, Tobi Décary-Larocque