SpatialTranscribe

Inspiration

Nearly 50 million Americans are hard of hearing, around 1 in 7 people. Several members of our team have had family members who struggled with hearing loss. Being hard of hearing can lead to numerous undesirable effects, like limited communication, isolation, and higher risk of mental health issues like depression or anxiety. We wanted to create an app using spatial programming to help people who are hard of hearing communicate with people around them.

What it does

Our app takes audio input data from the microphones attached to the Apple Vision Pro goggles, and transcribes it in a spatial text bubble for the user to see. It calculates the general direction of the audio input and places the text bubble in that direction so that the user can locate who is speaking to them. This makes for a quick and simple way to understand conversations around you.

How we built it

We used the SpeechRecognition API in Javascript to capture live audio input from humans, so that there would be as little delay for the user as possible. We implemented this with a flask web socket API. We built the text bubble components in React, and calculated the direction by differentiating the audio from the left and right audio channels to calculate the direction that the speech was coming from, making for a more immersive experience.

Challenges we ran into

The Vision Pro is locked down in terms of the data that we can output from it. For example, we were unable to collect raw camera data from the Vision to determine where to place objects in 3D space. These limitations greatly impacted our initial plans and decreased the functionality of the product. The limitations of the simulation and the apple vision pros themselves made it much harder to implement a working code.

Accomplishments that we're proud of

Creating a product is difficult on its own. Creating a substantial product that can be widely used for the sake of assisting the masses in a 22 hour time span is near impossible. Most of our challenges stemmed from that aspect. Working with react and WebSpatial, two softwares we are relatively new to became our first hurdle. Yet we persisted through to develop a code foundation capable of accomplishing our goal in a short manner of time, while also pushing the limitations of WebSpatial and audio configuration in 3d.

What we learned

We improved our ability to work with new softwares and faced many challenges that pushed us to consider multiple perspectives of an issue. Our react and web-spatial skills massively improved through coding and debugging, and our use of resources to define the boundaries of our product underlined the painstaking process of app development. We also familiarized ourselves with API's and AI models, and conceived an efficient service applicable to many.

What's next for Spatial Translate

Spatial Translate aims to not only recognize and accurately pinpoint one person speaking, but also improve the technology so that it can be used in group settings. The end goal is to implement the software into eyewear, so that people hard of hearing can go out on a day to day basis and thoroughly interpret what people say to them. This involves understanding who is speaking, what direction the sound is coming from, and the contents of the conversation.