SixthSense

Inspiration

Around 2.2 billion people around the world are visually impaired, and among those people, around 50 million are medically classified as blind. In 2013, medical expenses and loss of productivity cost the U.S. over 149 billion dollars.

Many people who are classified as “legally blind” use white cane. Canes are able to sweep the ground to detect ground obstacles, and also allow the user to create an audible sound by tapping the ground to notify others of their presence. Additionally, many blind people use mobility canes, a type of white cane, to detect the texture of the ground near them. However, there are many drawbacks to using these devices to navigate a complex and crowded environment, as they provide no information about objects not on the ground (overhangs & branches), signs, far distances, and overall context.

Through our app, we utilize recent advancements in LiDAR depth-perception technology and natural language reasoning to assist blind people in safely navigating a more complex and crowded environment.

What it does

SixthSense is a LiDAR and GenerativeAI assisted navigation tool for the visually impaired. To mitigate the problems related to walking canes discussed above, this application employs three major features. Firstly, it employs the LiDAR sensor on the iPhone to perform real-time environment mapping and create a 3D point cloud. Using this information, there is the auditory echo-navigation system that continuously sends pings, mimicking echo-location to the closest object to the user. Secondly, images are selected by a user and put through a GenerativeAI to analyze and identify key objects in a scanned area. Thirdly, we utilized GenerativeAI to construct a generalized summary of the surrounding environment for blind people to use when they enter into an unknown area. These features allow the visually impaired to be knowledgeable both more in depth and in height about their surroundings.

How we built it

To build the spatial audio, an 8D audio system was modulated using headphones. Different amplitudes of volume and pan were used in each headphone to simulate pings coming from different directions which was mathematically derived from the closest point in the LiDAR scan. To eliminate noise in the LiDAR scan, we used the DBSCAN point clustering method to remove outliers in our data.

To build the Generative AI auditory context response, the Google Vision Pro API was used to allow for input of 3D images of the environment, and returned either a brief summary of the closest object in the picture, or the general context for everything in the picture with mentioned key objects and relational positions.

Challenges we ran into

A major roadblock that we ran into was handling the LiDAR point cloud generated by the Swift ARKit API. The point cloud was dense, and through ARKit’s mapping, included many points that we did not need that slowed down our clustering code. To diminish this issue, we employed a variety of filtering methods including ground points filtering, distance filtering, and a custom generated angle filtering, which discarded points that were out of frame. These filters were tuned to remove points that no longer presented obstacles for the user, but still kept important key points.

Our goal was to use spatial audio to provide information on where objects were located. However, there was very limited documentation on how to achieve this with IOS’s libraries and modules. The implementations we did try did not work through several hours of debugging and experimentation. We devised a work around where we altered the pan for the audio output. From this, we were able to control the ratio of sounds coming from the left and right headphone source. However, this alone gives a flat representation of spatial information; we paired this method with also changing volume based on distances to give the illusion of direction and depth for the audio.

Accomplishments that we're proud of

Building SixthSense required us to integrate various technologies effectively. We learned how to harness LiDAR depth-perception technology and Generative AI to create a comprehensive navigation solution. This involved understanding the capabilities of these technologies and devising innovative ways to utilize them for our purpose. We also used advanced data structures like KDTrees to optimize our clustering algorithm since online libraries of DBSCAN in Swift were extremely computationally expensive, and as shown in our images, the results turned out great!

Developing spatial audio navigation presented significant challenges. We encountered hurdles in handling LiDAR point cloud data efficiently and struggled with limited documentation for spatial audio implementation on iOS. Overcoming these challenges demanded persistence, leading us to devise novel solutions like custom filtering methods and adapting audio panning techniques.

Simply, we had a lot of different parts all working together to give a holistic view for the application. Many of these parts were intrinsically unrelated but we tied it all together. We were proud that we were able to integrate so many sources of data and methods of presentation to give as much information as possible.

What we learned

We learned a lot about integrating LiDAR scanning and eliminating noise to provide useful and efficient information on the surroundings using iOS. Additionally there was a learning curve with integrating a Generative AI API with image analysis and prompt generation and response. Combining these ideas also allowed for interesting revelations and an enhanced knowledge about the capabilities of both. Recreating a spatial audio system also allowed for further understanding of the 8D audio system that is so widely used.

What's next for SixthSense

SixthSense was meant to augment a walking stick, not replace one. Therefore, future directions involve adding the camera to a walking stick directly, possibly with a gimbal to allow for always upright scanning and streaming of the camera to allow for the most efficient data processing and response. There can also be further development with the system using haptic features. The walking stick or camera could vibrate providing the same information that was provided auditorily, except without using part of the hearing sense to accompany bringing back the vision sense.