IF THE VIDEO IS CORRUPTED, PLEASE WATCH THIS VID HERE. THANK YOU :)
BACKUP: https://youtu.be/z4mf9RSSE5w note: main demo vid got corrupted this is backup vid
Inspiration
Recently i saw that meta ray ban with display released a SDK a couple weeks ago that allowed developers to code on them. Even though its currently in beta access i wanted to give it a shot. Its been a while since ive a development in the health industries with ai mainly in the glasses realm and with blindess/deaf. I have family members who have hearing/seeing disabilities so i thought it qwould be a good use case for them to use
What it does Glass turns Meta Ray Ban Display glasses into a hands free assistant for people who are blind, low vision, Deaf, or hard of hearing.
For blind and low vision users it describes what is in front of you, reads signs and menus and mail out loud, names objects, and gives spoken walking directions, all through the glasses speaker and the screen in the lens.
For Deaf and hard of hearing users it shows live captions of the people around you, labeled by who is speaking, with a small tone tag so you get whether something was a question, excited, or urgent instead of just the bare words. It reads fingerspelling into captions, coaches you through how to sign common words back, and can even learn a small set of signs you teach it and read them aloud.
And when you just want company, you say "explore mode" and talk to it like a friend as you walk, asking about whatever you see, until you say "normal mode" to stop. The whole thing is voice first, so you barely touch the phone.
How we built it The glasses are the eyes, ears, mouth, and screen. The iPhone is the brain and does the heavy lifting. Camera frames and your voice come into the phone, and the answers go back out to the glasses speaker and the lens.
We leaned on Anthropic's vision and language models to understand scenes, read text, hold a real conversation, and run a little voice agent that can operate the whole app and even flip settings when you ask. Deepgram gives us a natural sounding voice and fast, speaker separated live captions, with Apple's on device speech as a backup so it never goes quiet. We kept the always on safety and awareness loop on the phone itself using Apple Vision and a bundled segmentation model, so it works offline and keeps the camera private. The sign features run on device too, on hand pose landmarks, so no video of someone signing ever leaves the phone. And we wired Sentry through every fragile path, because a blind user cannot see a frozen screen and a Deaf user cannot hear a failed caption.
Challenges we ran into Sign language humbled us fast. We started out assuming we could send video to a model and get a clean translation back, and the research said plainly that this does not work yet, even for the best systems out there. So we scoped down to something honest: reading fingerspelling, coaching common signs, and a teach it yourself recognizer, and we put that honesty right in the app instead of pretending otherwise.
Audio was the other beast. The microphone and the speaker fight over the same hardware, so captions, the voice assistant, and the spoken replies kept stepping on each other until we built one coordinator to hand the audio back and forth cleanly.
We also learned the lens can only show text and images served from a web link, not files sitting on the phone, which sent us digging into exactly what it could and could not render. And the glasses microphone is locked to Meta, so the phone has to do all the listening.
Accomplishments that we're proud of It actually works from end to end, on real hardware. You speak, the glasses see, and the answer comes back in your ear and on the lens. We are proud that it bends instead of breaking, that nothing private leaves the phone unless you ask it to, and that we stayed honest about sign language rather than overselling it. The little tone tag on captions is a small thing we love, because it hands back something captions usually strip away. And explore mode, just wandering around chatting about the world, felt like the future for a second.
What we learned We learned that the hard part of accessibility is rarely the model, it is the plumbing: the audio routing, the latency, the privacy, the honesty. We learned how often the disability community has been burned by tech that overclaims, and that the kindest thing you can do is be clear about what your tool can and cannot do. We learned to build on the device first and reach for the cloud only when it truly earns its place. And we learned a lot about how blind and Deaf people actually move through the world, which quietly reshaped almost every decision we made.
What's next for Glass We want to move the AI off an embedded key and behind a proper backend so it is safe to ship. We want real distance sensing so navigation can warn about steps and drop offs, not just describe them. We want to grow the sign vocabulary alongside Deaf signers, because they should be leading this, not us. We want sound alerts so a Deaf user knows when an alarm goes off or their name is called. And most of all we want to get it into the hands of the people we built it for and let their feedback steer the rest.
Log in or sign up for Devpost to join the conversation.