Inspiration

I've always wanted to make products that make the world more accessible to more people, and as someone who is incredibly excited in the world of augmented reality--I saw an opportunity. I noticed how my mom has trouble hearing certain English accents, how sometimes I can't speak the same language as someone in front of me, how some people have hearing disabilities that can't be solved with hearing aids.

What it does & How I built it

This is where AR-scribe comes into play, it is an app that lives on your XREAL AR glasses and transcribes the world around you nearly in real time. It appears in the lower center of your field of view, just like how subtitles show up in the lower center of movies and shows. It uses your iPhone's on-device processing to transcribe the other person's speech and present it visually. It can optionally also translate between languages, enabling even more powerful workflows.

Challenges & What's next.

Since the processing is performed on device, the speed of transcription is limited by the phone you own. On my iPhone 15, there can be a 2-3 second delay for transcription and 10-15 second delay for translation. On newer iPhones, this can delay can be further reduced to allow for a better flowing conversation.

Currently, it also transcribes the audio of the person who is wearing the glasses. With additional engineering put into the project, the wearer's voice can be excluded to provide a smoother transcribe flow.

In a future version, the speed can be improved by working with cloud models that are more capable and faster, or by using more capable host iPhones that can process data faster.

Accomplishments that I'm proud of & What I learnt

I'm so proud of the fact that ARScribe essentially replicates Apple's latest feature for AirPods--live translation with the same accuracy and performance in most cases, while being a custom implementation from the ground up. On comparable devices, the transcription and translation speeds should be on par with Apple's technology.

I learnt how to work with the Speech Analyzer API that was introduced in June 2025, and also to work with the on-device Translate framework from June 2024 to bring this to life. Since these are XREAL AR glasses that don't officially support running apps on them, I came up with a custom solution which involved tricking the iPhone into thinking the AR glasses were a monitor which allowed me to work with SwiftUI's external monitor handling API and allowed me to display different interfaces on my iPhone and AR glasses.

Built With

Share this project:

Updates