Our inspiration comes from the 217 million people in the world who have moderate to severe vision impairment, and 36 million people who are fully blind. In our modern society, with all its comforts, it is easy to forget that there are so many people who do not have the same luxuries as us. It is unthinkably difficult for these visually impaired individuals to navigate everyday life and activities. We believe that the new technology of this era presents a potential solution to this issue.
What it does
InsightAI detects the location and size of common objects in real time. This data is necessitated by our novel 3D audio spatialization algorithm, which in turn, powers our Augmented Reality audio system. This system communicates the location of said objects to the user and allows for the formulation of a mental heatmap of the world. All of this is done through just a conventional mobile smartphone and headphones. This process can be terminated simply using our intuitive haptic user experience. It also supports multiple languages in order for the project to be scalable to other countries and cultures.
How we built it
We used Tensorflow.js for the real-time object detection. It is trained on the COCO Single Shot MultiBox Detection dataset with 90 object classes and 330,000 images. We then convert the object(s) into an audio signal via a text-to-speech algorithm with natural language synthesis that supports multiple languages. We then used a custom algorithm to effectively deliver the AR audio to the user’s audio device, in such a manner, that the user can understand the location of the indicated object. In order to properly interface with the visually impaired, we focused on minimalistic and intuitive audio-first design principles to facilitate usage by the intended audience. Finally we hosted the entire web app on Zeit to allow it to be accessible to everyone.
How does the augmented reality (AR) sound system work?
The sound is outputted binaurally through the web audio API. This means that we play each headphone or earbud differently, based on the location of the object. The differentiation in the sound is determined by our algorithm. You can think of our algorithm as a program that creates an mental audio data heatmap of the world around the user. Because of this immersive system, the user can very intuitively locate objects.
Challenges we ran into
There were a multitude of bugs, which were eventually solved through discussion and collaboration. One such bug was that the audio was quite slow and did not match with the rate of object detection, because we were downloading the audio snippet from an external source for every frame. We found a solution to this problem by downloading the files locally and playing those files complementing the objects detected. Additionally, we ran into many issues pertaining to getting the tensorflow.js model to work with mobile instead of desktop.
Accomplishments that we're proud of and what we learned
We are proud that we learned how to use Tensorflow.js to recognize many objects in real time, as this was one of our first projects that used live ML, and we are very proud of how it turned out. We also learned how to use the Web Audio API and created a surround sound left and right channel system using headphones. Further, this was one of our first projects to integrate AR.
What's next for InsightAI
We will definitely be updating our project in the future to support more functionality. For example, optical character recognition and facial recognition could be used to greatly make the lives of the visually impaired in everyday life. Imagine if the blind could immediately recognize people they knew through such a system. An integrated OCR system would also eliminate the need for braille, and allow for much easier navigation of both everyday life. Our app is very capable of scaling up to multiple different languages as well.