cAIne

audio and haptic feedback
first iteration
improvement
adding basic menu

The inspiration behind cAIne

If you have someone next to you, look at them. Are they wearing glasses, or are you? Chances are that's the case. Approximately sixty-five percent of the world has some form of vision correction, and half of those individuals have very low-vision, and half of that population has no vision. XR provides a unique opportunity of an evolved form of vision correction, mixing Context-Aware AI and haptic feedback to create a new cane - a cAIne. Using the Quest's passthrough and camera access, we can scan an environment in real time, as well as with the Meta RayBans. This calculates distances and providing instant responses for the end user, and provides an intuitive gesture based UI for hand tracking, or vibrational feedback for controller use, allowing a user to decide which feels more comfortable.

Old and New

cAIne received many updates from its inception just three weeks ago to its current state, slotting nicely into the Horizon Start Competition's guidelines. We're submitting this as an updated work, however, due to our participation in the SensAI Hackathon on November 16-18th. In the SensAI Barcelona Hackathon, our team accomplished an impressive amount of work.

We had created a strong distance rendering and image recognition feature utilizing the Meta Quest passthrough camera and YOLO AI. cAIne's functionality was utilized by providing strong navigational and sensory feedback through our 'tap-to-navigate' traditional cane mode using object-designated sound effects. Objects and room features received specific audio to define to the user what the item was, i.e.; a door would make a wooden knocking noise while a window would sound like tapped glass. Our UI was simplistic, intuitive, and functional. We had successfully implemented a controller-based menu, and had begun work on a hand tracking style as well.

Now, cAIne has gotten major upgrades. cAIne's audio-graphical mapping programs have been updated to provide a more precise read of a room, adjusting pitch and volume based on object distance. The improved system allowed us to add an on demand sonar that would scan the entire room to then provide an audio-graphic map through pings to cAIne. These two navigational modes are capable of being used simultaneously. This improves both the traditional and sonar navigational methods considerably.

Haptics have also been integrated into cAIne's navigation, creating a dual audio-vibration feedback -- vibrating along the controller with differing intensities to truly provide a sense of distance and location of room features and objects, like walls and floors. Meta's Haptic SDK implement cAIne's vibrations.

cAIne's user interface has been drastically improved. Four new inputs and interactions were added to the tool, fortifying an already powerful software. Hand tracking commands have been fully implemented, using sleek and intuitive gestures to execute the desired functions. cAIne uses Meta's microgestures to add this feature as listed here. Voice-based command also makes a welcome appearance, providing a fast and direct method of interacting with cAIne and its many tools and features. Meta Voice SDK is used to implement the voice commands. A stronger image recognition feature couples with the new object-to-voice aspect of cAIne -- a user may ask the tool what a detected object is, either with a natural gesture or by simply asking aloud. These new features compliment a visual upgrade with new iconography.

In all, cAIne is shaping up to be a robust, intuitive, and powerful accessibility tool, and is only showing signs of greater improvement.

Challenges we ran into

Figuring out the right AIs to work with our project and getting them to mesh in the way we wanted was a puzzle. This was made all the more difficult as not all AIs that we were going to utilize are available at this time, and we had to use substitutions. Setting up voice-to-command also had some issues, as many AIs have difficulty ascertaining accents. The time constraints made us limit our scope and cut features we would have loved to include. Another challenge was finding audio distinct enough and yet not too sharp for the cAIne's feedback.

Accomplishments that we're proud of

Wrangling together various image and spatial recognition AIs was an exhilarating challenge. We've managed to run a local Yolo model directly on the Quest 3, providing very fast response. The voice-to-command and gesture recognition models were also a major accomplishment. We're very proud of the sleek and intuitive navigational methods that define cAIne. With a small team, succeeding in making the cAIne register these features and objects was a rewarding endeavor. The limited time constraints added a sense of accomplishment in what we've been able to achieve.

What's next for cAIne

As this software's goal is to be assisting low-vision/Blind individuals, our next stage goal is to implement a greater amount pre-set objects that our AI can identify, increasing from 80 immediate recognition objects to possibly hundreds. We will also fine-tune image-to-speech for real time navigation, as well as the way that distance is represented by the cAIne to improve our audio-graphy. Another thing to add is an automatic height-to-length cane algorithm for optimal use. A major future goal would be to implement more wearable technology, such as glove controllers with a larger array of feedback for both mobility, ease of access, and combining the current trade off that the controller vs hand tracking has, thus negating the issue. We hope to work directly with cane users in the future, thereby being able to take in their considerations, wants, and needs. We want to integrate their experiences and input into ensuring that cAIne will not only be a helpful tool, but the best option.