TulibotXR POV

👓 Tulibot XR: Conversational Passthrough AI

Project Story

Inspiration

Our project's inspiration came from a personal struggle: watching my grandmother's life drastically change after losing 80% of her hearing at age 68. Simple family moments, like our dinner conversations, turned into struggles that highlighted her isolation. This personal difficulty made us realize that this is a global issue, as over 460 million people worldwide live with significant hearing loss. This results in inequality in economic opportunities, education, and social interactions for the deaf community. We wanted a solution that provides true independence and ends the social isolation often faced by the deaf community.

What it does

Tulibot XR provides real-time, personalized, and spatially contextual text transcription for the deaf community. It is an innovative solution using Mixed Reality (MR) on Meta Quest 3 to address communication inequality.

The core function converts spoken conversation into text in real-time. Utilizing the Meta Quest Passthrough API, the system renders the transcribed text as a chat bubble that is spatially anchored near the face of the person speaking. This immediate visual placement allows users to instantly know who is talking.

Crucially, it features an ASL Grammar Mode. By toggling a button, the system uses Generative AI to translate standard English sentences into ASL Gloss structure (Time-Topic-Comment), adapting to the unique linguistic structure used by the Deaf community.

How we built it

We built Tulibot XR by combining advanced AI processing with the immersive capabilities of the Meta Quest platform (Horizon OS).

Speech-to-Text (STT): We utilized Meta Wit.ai (Voice SDK) to power the real-time transcription. Its low-latency performance was essential for keeping up with natural conversation flow.
ASL Grammar Translation: We integrated Llama 3.3 (via API) to process the English transcripts. Llama's advanced language understanding allows us to rewrite sentences into ASL Gloss format (e.g., converting "I am going to the store tomorrow" to "TOMORROW STORE I GO"), making the captions significantly more readable for native signers.
Spatial Context & Face Detection: We implemented Unity Sentis to run a YOLOv8 AI model directly on the device (Edge AI). This allows the app to detect faces in the Passthrough feed and anchor the chat bubbles to the correct speaker in 3D space, without sending video data to the cloud.
Hardware Integration: We optimized the audio pipeline to support external microphones (via the Tulibot Microphone Kit) to ensure high accuracy even in noisy environments.

Challenges we ran into

The primary challenge was achieving the necessary speed and accuracy for a real-time conversational tool in a Mixed Reality environment. Running AI models like YOLOv8 on a mobile chipset (Snapdragon XR2) required optimization using Unity Sentis to ensure the frame rate remained comfortable for VR.

Another challenge was the LLM integration. Ensuring Llama 3.3 returned strictly formatted ASL Gloss without conversational filler required careful "System Prompt" engineering. We also had to handle the asynchronous nature of API calls so that the captions would update smoothly without freezing the user interface.

Accomplishments that we're proud of

We are most proud of successfully implementing contextual communication by anchoring the text bubble to the correct speaker's face using on-device AI. This is a major step toward achieving true social inclusion.

We are also proud of the ASL Grammar feature, which moves beyond simple subtitles to true linguistic accessibility. Our commitment to active collaboration with the deaf community has ensured the product is relevant and meets their unique needs.

What we learned

We learned that for accessibility tools in XR, context is everything. Simply providing a transcript is insufficient; showing who is talking and where they are in the physical world is crucial for full engagement. We also learned the immense potential of pairing powerful LLMs like Llama with the real-world visibility offered by Passthrough to solve complex social problems.

What's next for Tulibot XR

We plan to expand Tulibot XR's capabilities to further support the deaf community. This includes refining the on-device models for even faster performance and adding support for more sign language variations (e.g., BISINDO).

We are actively working on implementing Speaker Diarization to distinctively identify multiple speakers in group settings. Beyond this, we aim to integrate broader AI Assistance applications, such as meeting summarization, emotional tone analysis, and context-aware social cues.

We will also continue to seek feedback from the community to optimize the product for their needs in education, work, and social interactions.

Built With

and-custom-vocabulary-training)
and-custom-vocabulary-training)-key-meta-quest-features:-passthrough-camera-access
context-aware-ai
context-aware-ai-core-functionality:-speech-to-text-(stt)-processing
llama-(for-speech-to-text
meta-horizon-os
meta-spatial-sdk
passthrough-camera-access
speaker-diarization
speech-to-text
stt)
unity
unity-ai-/-machine-learning:-llama-(used-for-speech-to-text