This project stemmed out of a discussion at Chipotle about how certain contemporary composers are trying to create music using data collected from arbitrary sources such as Jupiter's atmosphere. When people listen to that music, they don't actually experience the data that the piece was supposedly supposed to convey. Naturally, we wondered what data could actually be successfully conveyed unconventionally through audio. What if we could use depth data to create 3D audio? Could such an application make navigation and sensing easier for visually disabled people?

What it does

SoundSight uses computer vision to collect spatial data in the user's immediate vicinity and converts it into 3 dimensional sound allowing complete perception of the environment without the visual senses. The perception is further enhanced by an object detection algorithm running on the cloud (Microsoft Azure) that tells the user about the exact object he/she is approaching. For example, unlike traditional sonar-based solutions, SoundSight can distinguish between objects like closed doors and walls and will notify the user appropriately.

How we built it

SoundSight is built using C# and Unity3D along with a Microsoft Kinect to produce a spatial mapping of the surroundings. On this depth data, we applied convolutions to filter out noise and created a point cloud that in turn was used to create a 3 dimensional audio profile. Additionally, we obtained bounding boxes for all detectable objects in the environment using the Microsoft Azure Computer Vision API to notify the user about the highest confidence objects.

Challenges we ran into

When we came up with the idea Saturday morning, we originally wanted to use a depth/stereo camera and use a python-based interface to implement the hack. We worked around the hardware limitations (no depth camera) by realizing that a Microsoft Kinect would be an ideal substitute for a depth camera. Still, we had to venture into an unfamiliar tech stack (Unity) that worked best with Kinect and also could produce high-quality 3d sound. We ran into certain issues working with integrating C# dependencies into a Unity environment (primarily for image processing.) Testing also proved to be a challenge in the cluttered and noisy hacking space for such a sensitive device and hack.

Accomplishments that we're proud of

Successfully integrating object detection and depth data to create an augmented 3D profile of the surroundings. Filtered depth data to remove significant amounts of noise from our data which was initially very pesky. Utilized spatial audio (a still emerging feature in software) as the primary means of expression of our data.

What we learned

Humans are better at recognizing sound than we think they are. Using products supported by the same company makes development smoother and faster. How to hack a Kinect.

What's next for SoundSight

Collaborating with airports, shopping malls, parks and other public places to extend the technology and make it more robust. SoundSight provides a cheaper alternative to service dogs and hence can help towards developing a more inclusive society. Extending object detection to recognize faces of people the user might know.

Share this project: