Inspiration
466 million people worldwide communicate using sign language, yet the digital world—video calls, online classes, virtual meetings—is built exclusively for spoken communication. We watched ASL users struggle to participate in Zoom calls, forced to type in chat or wait for interpreters while conversations flowed around them. We realized that in 2026, with all our technology, deaf and hard-of-hearing individuals still face unnecessary barriers in spaces where everyone else can simply speak and be heard. HearAI was born from a simple question: What if sign language could have a voice in the digital world?
What it does
HearAI is a Chrome extension that gives ASL users the ability to participate verbally in online conversations. Using your webcam, it watches you fingerspell in American Sign Language, detects each letter in real-time using computer vision, accumulates them into words, and speaks those words aloud using natural-sounding text-to-speech.The user experience is seamless: activate the extension, start signing, and HearAI converts your signs into spoken audio that others can hear through your computer's speakers. You can customize the voice speed, pitch, and tone, choose between automatic speaking (after you pause) or manual control (speak when ready), and track your conversation history. It works on any website—Google Meet, Zoom, educational platforms, anywhere online communication happens.
How we built it
Tech Stack:
Frontend: React + Vite for a fast, modern Chrome extension interface Machine Learning: TensorFlow.js with a custom CNN model trained on 87,000+ ASL alphabet images, converted from Keras to run directly in the browser Computer Vision: Real-time webcam capture processing frames at 10 FPS, with image preprocessing (64x64 RGB normalization) for model inference Speech Synthesis: Web Speech API for natural text-to-speech with customizable voice parameters Styling: Tailwind CSS for a polished, accessible interface Architecture:
WebcamCapture component streams video and extracts frames via Canvas API Preprocessor utility converts frames into model-ready tensors CNN model predicts letters with confidence scores Letter buffering logic smooths predictions and accumulates letters into words Pause detection triggers word completion (configurable timing) Speech synthesis queue manages audio output with user-controlled settings Key Features:
Real-time ASL fingerspelling detection Confidence threshold filtering (only displays high-confidence predictions) Auto-speak mode and manual trigger options Adjustable speech rate, pitch, volume, and voice selection Word history tracking with timestamps Settings persistence across sessions
Challenges we ran into
Model Integration: Converting a Python/Keras CNN model to TensorFlow.js while maintaining accuracy was tricky. We had to optimize the model architecture for browser performance and handle WASM dependencies in the Chrome extension environment. Real-time Performance: Balancing detection accuracy with speed was challenging. Processing every frame caused lag, so we throttled to 10 FPS and implemented efficient tensor disposal to prevent memory leaks. Letter Buffering Logic: ASL fingerspelling happens quickly, and without smoothing, the same letter would appear multiple times. We built a sophisticated buffering system that holds letters for 300ms before adding them to words, preventing duplicates while maintaining natural flow. Pause Detection: Determining when a word is "complete" required fine-tuning. Too short and words split awkwardly; too long and conversation felt sluggish. We settled on a configurable 2-second default with user adjustment options. Camera Permissions: Chrome extension webcam access has strict security requirements. We navigated Manifest V3 permissions, content security policies, and cross-origin restrictions to make it work reliably. Demo Environment Issues: Our live demo worked perfectly in testing but hit camera permission issues in the presentation environment, teaching us the importance of having backup demonstrations and modular component testing.
Accomplishments that we're proud of
We built something that actually works. In 6 hours, we created a functional Chrome extension that genuinely solves a real accessibility problem—not a prototype or mockup, but production-ready code. Technical achievement: Successfully integrated machine learning, computer vision, and speech synthesis into a cohesive browser extension that runs entirely client-side with no server dependencies. Accessibility-first design: Every decision prioritized the user experience for ASL users—customizable settings, clear visual feedback, intuitive controls, and privacy (all processing happens locally). Real impact potential: HearAI isn't just a hackathon project—it's immediately deployable and could help millions of people participate more fully in digital spaces. Problem-solving under pressure: When our live demo failed during presentations, we professionally pivoted to show our recorded demo, live code, and working components, demonstrating both technical depth and composure.
What we learned
Technical Skills:
How to convert and optimize ML models for browser environments Real-time video processing and frame manipulation with Canvas API Chrome extension development with Manifest V3 security requirements Speech synthesis API capabilities and limitations Performance optimization for ML inference in JavaScript
Design Insights:
Accessibility isn't just about compliance—it's about understanding how people actually communicate User control is crucial: auto-speak vs manual modes accommodate different communication styles Visual feedback (confidence indicators, letter animations) builds trust in AI systems Simplicity wins: clean UI beats feature bloat
Teamwork:
Dividing work by strengths (coding, testing, presentation) maximizes output in time-constrained hackathons Communication and frequent integration prevent last-minute surprises Having backup plans (recorded demos, modular testing) saves presentations
Perspective:
Building for accessibility opens your eyes to barriers most people never notice Technology should adapt to humans, not the other way around Real impact doesn't require complexity—sometimes the simplest solution is the most powerful
What's next for HearAI
Short-term (next 3 months):
Improve detection accuracy: Fine-tune the model with more diverse training data (different lighting, hand sizes, skin tones) Add common phrases: Pre-programmed quick-access phrases ("Thank you," "I have a question," "Please repeat that") Multi-platform support: Extend beyond Chrome to Firefox, Edge, and Safari Mobile app: iOS and Android versions for on-the-go communication
Medium-term (6-12 months):
Full ASL grammar: Move beyond fingerspelling to recognize complete ASL signs and sentence structure Bidirectional translation: Convert speech to sign language animations for hearing users learning ASL Multiple sign languages: British Sign Language (BSL), International Sign, and others Offline mode: Download models for use without internet connection Integration APIs: Allow developers to embed HearAI in their platforms
Long-term vision:
Real-time conversation mode: Two-way translation enabling seamless communication between ASL and spoken language users Educational platform: Interactive ASL learning tool with instant feedback and progress tracking Enterprise deployment: Partner with schools, companies, and telehealth providers to make digital spaces truly inclusive Open-source community: Release core components for developers to build accessibility tools Research collaboration: Work with deaf community organizations to ensure HearAI truly serves user needs
The ultimate goal: Make sign language a first-class citizen in digital communication. Every video platform, every online classroom, every virtual meeting should work seamlessly for ASL users—not as an afterthought, but by design. HearAI is just the beginning. We're building a world where everyone has a voice, regardless of how they choose to communicate.
Log in or sign up for Devpost to join the conversation.