Supersight

Scanning text
Translated and read text

Inspiration

We were inspired by seeing how well the latest Quest 3 cameras let the user see real life right now, but what if we used them to help the user see even better, especially for visually impaired or blind people? There is a tremendous opportunity to help people here and it's only going to become more relevant as Meta Quest cameras continue to improve each generation.

What it does

Supersight allows the user to use their Quest 3 passthrough cameras to capture any text and have it read back to them in text-to-speech, translating if needed. It also blows up the text to a larger resolution in the headset for easier reading.

How we built it

We had to build it using casting to your PC as we are not allowed direct access to the Quest 3's camera data. There is also an app running on your PC waiting for the casting data and input from the headset. The headset tells your PC over the network that you want to read text. The PC listens for these network messages and will then capture a screenshot of what is currently casted. We used Photon Fusion and ADB for the networking. The PC app then crops the screenshot and uses Google's Cloud Vision API to get a text string from the API, translating if needed. The PC then sends the cropped image and text string back to the headset so it can be read to the user using text-to-speech and blown up to a larger resolution.

Challenges we ran into

The biggest challenge with this project was that the Quest 3 doesn't allow direct access to the camera data. Instead, we had to build a workaround where you cast your headset to a nearby computer, then stream data back to the headset. We also ran into challenges where hand written text was often not clear in the image, especially in poor lighting conditions.

Accomplishments that we're proud of

We're proud to have made a usable app that can read text for vision impaired users. While there's a long way to go, we feel like this proof of concept is a big step towards improving the quality of life for millions of people with disabilities.

What we learned

Sending files in Photon Fusion is very difficult so we had to learn a lot about networking APIs and transferring data between devices. We also learned about Google's cloud APIs which was tremendously helpful in detecting and translating text strings. We learned about calling scripts from other programs and how to have a C# program run a whole suite of powershell scripts at once.

What's next for Supersight

We would expand Supersight's capabilities in many different ways. In the short term, this includes speaking out loud text or zooming in on real world text, a virtual walking cane that gives haptic feedback on objects the user touches with it, using sound and haptics to visualize 3d objects, and AI image upscaling. This would provide a powerful suite of tools for people who are blind or visually impaired, affecting millions of people.

In the medium term, we would also expand translation options with the user being able to decide which language to translate text into. We would also automatically translate all text in your field of view, superimposing it on its real world position. This would mean a user could go across the world and interact with other languages there like it's their home country's language.

Each Quest generation has significantly improved cameras from the previous generations. The Quest 4, 5 and so on will continue to follow this pattern. In the long term, this application could eventually take advantage of those upgrades to have even greater super vision. By the Quest 10, Supersight could be used as powerful binoculars or a telescope for objects thousands of feet away!

Built With

adb
c#
google-cloud
meta-quest-casting
photon-fusion
wit-tts

Updates

Rory Soiffer started this project — Apr 26, 2024 12:55 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.