It's really quite simple, we wanted to create an AR communication tool that would allow people to be able to see what other people are saying. We believe that regardless of your hearing ability that everyone should be able to communicate with ease.
The Hololens app transcribes the words of a person talking into the field of view of the user. In the final app, we will have a number of features. To aid in meaning-making, it augments contextual information by conveying changes in volume and emotion. With multiple people speaking, the application detects who is speaking and links the words with the individual speaker, helping the user track conversation. We built the app by linking Unity’s Dictation Recognizer feature for Windows, Microsoft Cognitive Service’s facial tracking and Emotion API, into one application. We had difficulty integrating the Cognitive Service’s code, connecting to the libraries, and it took quite some time getting code to deploy on the Hololens as many of us were brand new to working with it. We also had difficulty trying to parse the audio in the use case of having multiple speakers: finding a way to reverse spatialize audio (get information about the specific direction audio is coming from) proved difficult. Getting the transcription to stop when the user is speaking will be an ongoing issue that we will address with a manual stop for now. There are a number of features we hope to include in the future.