Voice to ASL (American Sign Language) Translator

🌟 Inspiration

Across the globe, millions of deaf and hard-of-hearing individuals rely on sign language to communicate. Yet, mainstream media, digital platforms, and real-time conversations often exclude them due to a lack of accessible sign translation tools. We wanted to bridge this gap with an automatic voice-to-ASL video translator — something lightweight, fast, and easy to integrate into everyday media consumption. 💬 What It Does

Our project captures spoken or typed text and translates it into American Sign Language (ASL) using pre-rendered video clips of ASL signs. Users can speak into a mic or let a browser extension transcribe ongoing audio, and the system returns a video that visually spells out the phrase in ASL, making web content and conversations more inclusive.

Features:

🎙️ Transcribe audio or speech input (via web extension)

✍️ Clean and validate text

🔤 Map letters and bigrams to ASL poses

🎞️ Stitch together ASL video clips and return a full video

🌐 Lightweight and embeddable — can run over any webpage

🛠 How We Built It

We started by leveraging the WASSL dataset, which contains a rich collection of ASL signs in pose-format. Our original goal was to tokenize sentences into words and map those directly to pose sequences. However, due to performance bottlenecks and inconsistent availability of word-level videos, we adapted our approach to character-level mapping with support for common bigrams (e.g., "TH", "QU", "ER"). Backend (Flask):

Extracted pose-format data using OpenPose

Converted pose sequences into .mp4 video clips

Stored per-letter and bigram videos in a lookup system

On receiving text:

    Validates & sanitizes input

    Replaces whitespace & punctuation

    Maps characters/bigrams to corresponding video paths

    Stitches clips together using moviepy

    Returns a playable video of fingerspelling in ASL

Frontend (Chrome Extension):

Listens to speech or grabs captions from active videos

Sends transcript to the backend

Overlays ASL video translation on top of current page

⚠️ Challenges We Ran Into

Pose alignment: Stitching pose-format data smoothly into consistent videos was tricky — minor jitters made clips feel disjointed.

Word-to-pose generalization: Not all words had pose data in the WASSL dataset, so we shifted to character-level granularity.

Browser integration: Injecting a seamless, non-intrusive overlay into all websites without breaking UX required careful scripting.

Real-time performance: Keeping latency low while converting speech to video translation on the fly was a major design constraint.

🏆 Accomplishments That We're Proud Of

Built a fully functional voice-to-ASL pipeline from scratch

Created a working Chrome extension that injects ASL video overlays into any webpage

Successfully processed and stitched together pose-format clips into meaningful translations

Designed a scalable framework that can support more advanced gestures, not just fingerspelling

📚 What We Learned

How to work with pose-format data and convert it into real-time multimedia

The intricacies of sign language translation — including limitations of current datasets

Building modular video pipelines that stitch together dynamic content

Developing browser extensions and integrating them with our own API's made using flask

Importance of accessibility design in modern web development

🚀 What’s Next for Voice to ASL Translator

🔡 Add word-level mappings for commonly used terms and phrases by using a transformer to convert to ASL gloss as an intermediate step which can then be posed

🧠 Use an ASL grammar-aware model to reorder and simplify English input

🎥 Support facial expressions and body gestures, not just hand poses

🧩 Use a 3dd avatar with out pose data to show an avatar acting out poses

📱 Expand to a mobile app with camera-based gesture feedback

Built With

Share this project:

Updates