Inspiration

One of our closest friends is hard of hearing and speech impaired. Every time we hang out, the same routine plays out he pulls out his phone, opens the Notes app, and we pass it back and forth just to have a conversation. Social gatherings are especially tough; you can't pass a phone around a table and still feel like you're part of the group. We wanted to build something that let him talk with us, not at a screen. Something hands-free, natural, and unobtrusive so he can be present in the conversation, not just adjacent to it.

What it does

Voice Bridge does two things and does them really well. Speech from anyone nearby is picked up by a microphone, transcribed in real time, and displayed on a transparent OLED screen. The result is effortlessly hearing people speak naturally.

For responses, the user navigates and selects words through head gestures, with intelligent text suggestions that anticipate what they want to say. Once ready, the message is spoken aloud through a speaker, giving them a literal voice in the conversation.

How we built it

Speech-to-Text: We used the speech recognition to capture and transcribe speech in real time, with additional software processing layered on top to improve accuracy in noisy environments. Transcriptions are pushed live to a transparent OLED display, giving the user a seamless, glasses-like captioning experience as words are spoken.

Head Gesture Input: To give the user a hands-free way to compose responses, we integrated an accelerometer that tracks 8-way head motion across 2 axes, mapping each direction to a unique input. This lets the user navigate an on-screen keyboard, select words, and cycle through smart text suggestions all without touching a device.

Text-to-Speech: Once the user finalizes their message, it's synthesized into natural-sounding speech and played through a speaker, letting them respond out loud in real time.

The entire system runs on a Raspberry Pi, keeping everything self-contained, portable, and fully offline no internet connection required, which was important to us for both reliability and privacy.

Challenges we ran into

Form Factor & Hardware Integration: Fitting all the components into a wearable form factor involved constant tradeoffs between size, weight, and comfort. Getting it to feel like something a person would actually wear took several iterations.

Mounting the Display Arm: Securing the OLED arm to the helmet in a position that was stable, adjustable, and actually usable was trickier than expected. Small shifts in angle made a big difference.

Microphone Signal Quality: Getting clean audio input in real-world conditions required experimentation with noise filtering and signal processing, especially in louder environments like group conversations.

Smart Glasses SDK Limitations: We initially explored using existing smart glasses hardware, but there's virtually no public SDK support for consumer devices. We had to build our own display solution from scratch.

Real-Time Performance on Edge Hardware: Running speech recognition and TTS locally on a Raspberry Pi with low enough latency to feel natural in conversation required careful optimization throughout the pipeline.

Bluetooth Audio Routing: Getting audio to route correctly to a Bluetooth speaker on the Pi surfaced some platform-specific quirks that took significant debugging to resolve.

Accomplishments that we're proud of

A functional, wearable prototype: All hardware components the Pi, OLED display, accelerometer, microphone, and speaker successfully integrated into a helmet you can actually put on and use.

Real-time transcription on embedded hardware: Running speech-to-text locally on a Raspberry Pi with low enough latency to feel natural in conversation was a meaningful technical achievement.

Solving a real problem for a real person: This wasn't a hypothetical use case. We built this with someone specific in mind, and knowing it could genuinely change how they communicate in everyday situations made the long hours worth it.

Full offline functionality: The entire pipeline runs on-device. No internet, no cloud, no privacy concerns important for something meant to be worn in public.

Intentional head-gesture input: Building a letter selection system driven by head movement, and getting it to a point where users can control it deliberately and accurately, was one of our proudest moments.

What we learned

Building this project pushed us across hardware, software, and machine learning in ways none of us had fully experienced before. We learned how to deploy speech recognition models on resource-constrained hardware and how to navigate the real tradeoffs between accuracy and processing speed when every millisecond affects the feel of a conversation.

On the input side, we gained a deep appreciation for sensor noise small amounts of interference in accelerometer data can make an input system feel broken, and filtering it reliably took more care than we expected.

We also came away with a clearer understanding of how tightly hardware and software decisions are coupled in wearable systems. A choice about component placement affects latency; a choice about model size affects battery draw. Everything is connected. Most broadly, this project showed us how powerful machine learning can be as a tool for accessibility and how much thought it takes to deploy it responsibly in a real-world context, on real hardware, for real users.

What's next for Voice Bridge

The helmet got us to a working prototype, but the vision has always been something smaller. The next step is transitioning the form factor from a helmet to a glasses-style wearable lighter, less obtrusive, and something a person would genuinely want to wear every day.

Beyond the hardware, we want to push the software further too. That means improving speech recognition accuracy in noisy, real-world environments, refining the head-gesture interface to make text selection feel more intuitive, and optimizing the full pipeline for better battery efficiency and longer use between charges.

Ultimately, we want Voice Bridge to disappear into the background a seamless, barely-there device that lets people communicate naturally without anyone in the conversation even noticing it's there.

Built With

Share this project:

Updates