I was at a Thanksgiving gathering last year. There was a vision impaired guest who sat alone, except for when someone came over to chat.
What it does
HearSee uses a camera and processor to recognize objects and images around the user, place them spatially, track their location as the user moves, and play audio snippets that represent each item. The audio is spatially oriented by the item's position relative to the user, with farther items being softer and closer ones louder. With the Bose AR headset, the audio is further adjusted around the user, based on the user's head orientation.
How I built it
I built an iOS app to handle the image and object recognition and tracking, connect to a Bose AR headset and read its properties, and process the audio to play back through the headset. The app is written in Swift and uses ARKit and the Vision framework to support the recognition and tracking tasks. It uses the Bose SDK to connect to the headset, and read its orientation. It uses AVFoundation to do the audio processing and mixing. Audio snippets were pulled from the UMG catalog (and some other sources for demonstration purposes), and processed with some Dolby tools.
Challenges I ran into
It's a very complex task to track all of the object information and mix audio appropriately to represent their spatial orientation to the user. I first had to get all of the components (image and object recognition, processed audio files, and the app infrastructure) together before I could work on the tracking and mixing task, which had to be done overnight.
Accomplishments that I'm proud of
The basic operation of the app works very well. With refinement, it can be a very useful tool for the vision impaired community. As I built the app, I realized it has utility for the sighted world as well, in terms of music discovery, promotion, and exploration.
What I learned
I learned a great deal about audio mixing on iOS, as well as some insight into the breath and depth of tools available to locate, identify, process, and play music.
What's next for HearSee
The camera function can be moved from the iPhone to a dedicated dedicated device like the Intel RealSense camera. The app should be more configurable and adaptable to each user. I would like to explore the use cases for music (discovery, promotion, and exploration) when used by sighted people.