Heard

Inspiration

What it does

How we built it

Project Report: Heard

Wearable Real-Time Speech-to-Text HUD for the Hearing Impaired

Problem Definition and Context Deaf and hearing-impaired individuals often face communication barriers in public spaces, especially in fast-paced environments like restaurants or customer service counters. In one real-life scenario, a deaf individual struggled to place an order at a restaurant, prompting the idea for Heard: a wearable, real-time speech-to-text HUD that visually displays transcriptions of spoken speech using a reflective lens display. Heard aims to improve face-to-face communication for the hearing-impaired by offering: • Real-time offline transcription • A discreet, head-mounted visual display • A battery-powered, compact wearable design
Identified Constraints Throughout the design and development process, several constraints shaped decisions: Constraint Type Description Power The prototype isn’t battery-powered yet due to time constraints. Future versions will use compact LiPo batteries with 5V boost converters. Compute The Raspberry Pi Zero 2 W was chosen for size and capability, but limited us to using Vosk Small instead of larger, more accurate models. Connectivity No cloud, WiFi, or IoT functionality was integrated—everything is processed locally for full offline performance. Data Input Audio input is captured via a salvaged microphone from a standard wired earpiece. Display Size & Layout OLED screen size limited display space; long messages had to scroll smoothly. HUD reflection angle needed to be carefully planned to avoid obstruction. Time With only 4 days to develop the MVP, trade-offs had to be made— especially around tuning, integration, and 3D housing. Component Alternatives Considered Final Choice Rationale MCU/SoC ESP32, Raspberry Pi Pico Pi Zero 2 W ESP32/Pico couldn’t handle real-time speech recognition with sufficient accuracy. ASR Model Whisper, DeepSpeech Vosk Small Whisper lacks streaming support. Vosk Small gave reasonable accuracy on-device. Display LCD, TFT, Transparent OLED SSD1306 OLED + mirror HUD OLED + mirror was light, cheap, and required no complex optics. Mic USB mic, MEMS, Analog mic with ADC Earphone mic Easily available, compact, solderable to GPIO. Power Powerbank, LiPo + Boost USB (for now) Final product will use LiPo, but MVP used USB for simplicity. HUD Placement Direct lens mount, top reflection Side reflection using mirror film Allows wearer to maintain line of sight while reading captions.
Design Alternatives and Final Decisions Considered Designs:
Tools and Technologies Used Category Tool/Library/Hardware Reason AI Vosk (Small Model) Offline, streaming, low-latency speech-to-text Embedded/IoT Raspberry Pi Zero 2 W Lightweight yet powerful enough for Vosk Audio pyaudio, arecord Real-time microphone capture Display SSD1306 OLED via luma.oled Easy integration and compact form HUD Optics Mirror film To create a reflective display on the glasses Interfaces GPIO, I2C For soldering mic and OLED Software Python, Vosk API, OpenCV (planned) Fast prototyping Tools Used Soldering Iron, Jumper Wires, Breadboard Basic electronics assembly 3D Design (Planned) TinkerCAD, Fusion360 To build printable wearable frames and shoulder units
Performance Tests and Benchmarks Note: Due to time constraints, formal performance testing was limited. Below are approximate field results. Test Result Startup Time ~10 seconds Transcription Delay ~0.6 – 1.0 seconds Battery Life (Est.) N/A (not yet integrated) Recognition Accuracy ~70–80% in quiet indoor environments Scrolling Test Smooth scrolling of long sentences, readable within 3–5 seconds Frame Rate (OLED) 15–20 FPS (pseudo-scroll performance) Temperature Test Stable with passive cooling Audio Sensitivity Performs best within 1 meter of speaker
Media & Build Snapshots • 📹 Demo Video: https://drive.google.com/file/d/1tO3klmi- mzq20BiF_H33DnzctHedqnBn/view?usp=sharing • 🛠 Build Highlights: o OLED mounted on glasses handle with HUD mirror film. o Raspberry Pi shoulder mount in progress (3D model planned). o Earpiece mic soldered to GPIO via I2C.
Conclusion and Future Work Heard demonstrates that affordable, offline speech-to-text can be achieved in a wearable form factor. The project effectively met its core goals: • Real-time captioning for the deaf • Fully offline functionality • Compact and lightweight design Future Upgrades: • Add a LiPo-powered shoulder module for full portability • Improve accuracy with quantized larger ASR models • Develop a 3D-printed casing for all components • Improve mic input with pre-amp and noise cancellation • Integrate user customization features (font size, scroll speed, etc.) • Optional Bluetooth audio input support. Challenges we ran into