Inspiration

As VR social spaces gain mainstream attention and funding (from platforms like Meta’s Horizon to devices like the Apple Vision Pro) it’s clear that immersive virtual interaction is becoming a rapidly growing form of communication and connection.

But one major barrier still holds people back: language.

Whether it’s talking to someone from another country or trying to include users who are deaf, mute, or socially anxious, communication can break down quickly. We wanted to build a tool that removes that friction, so anyone can feel confident, understood, and connected in virtual reality, no matter where they’re from or how they communicate.

What it does

Our app is a standalone executable that translates, transcribes, and dubs speech in real-time. You can use your mic, type text directly, and the system automatically converts your communication into another language, spoken or written. It works independently, processing everything locally, and then pipes the translated text or voice into a format that the most popular VR platforms can use directly, enabling seamless interaction in VR between people who don’t share the same native language or communication method.

How we built it

We combined multiple tools and technologies to bring this app to life. Python powers our backend, running the transcription and translation pipeline using OpenAI’s Whisper for transcription and OpenNMT's CTranslate2 for a lightweight local AI model to handle translations. The frontend interface is built with React, Vite, and Tauri, which creates a lightweight desktop application. OSC (Open Sound Control) is a protocol we had to use to facilitate communication between our app and VR platforms that utilize Unity, like VRChat, passing data such as translated text between everything. For dubbing, we integrated pyt2s for voice generation, providing realistic a AI-synthesized voice output in different languages. This voice would be piped into a virtual audio channel using VB-Audio, which could then be selected and used within the Unity environment and let the user "speak" with their new voice!

Challenges we ran into

Our #1 priority / issue was finding ways to reduce the delay between speech, transcription, translation, and dubbing to facilitate a smooth conversational experience. We also faced challenges when creating the executable application with Tauri and its Python backend, as integrating all the technologies together required careful coordination to ensure everything ran seamlessly.

Accomplishments that we're proud of

We’re proud that it actually works! Seeing an avatar speak in one language and hearing another language come out on the other side is great. Considering the giant tech stack with technologies most of us had never used before, it’s even more satisfying that everything came together.

What we learned

We learned a ton about real-time language processing, voice synthesis, and cross-platform communication. This was everyone's first time using Tauri, OSC, and these certain AI models for the first time, and it was a great exercise in pulling together a multi-layered tech stack under pressure.

What's next for Dub2

We want to expand support for more languages and dialects, fine-tune the dubbing voice models, and add UI support for users to change voices and customize settings.

Built With

Share this project:

Updates