DuckSpeak Logo

DuckSpeak

DuckSpeak bridges the communication gap between the Deaf and hearing communities with a fully browser-based, privacy-first interface.

Inspiration

Every day, 70 million deaf people face communication barriers in education, workplaces, and daily life. Only about 1% of the world knows sign language, meaning simple interactions often require an interpreter or are impossible altogether.

We asked ourselves: “What if technology could merge human senses, making visual language audible, and speech visible?”

That question inspired DuckSpeak, a bridge between the deaf and hearing worlds that truly embodies the Human Augmentation track: enhancing human ability and inclusivity through AI.

What it does

DuckSpeak enables real-time bidirectional translation between American Sign Language (ASL) and spoken English during live video calls.

A deaf user signs → the app detects and translates it into speech and captions for hearing participants.
A hearing user speaks → the app generates captions in real time.

It’s the first platform that allows two-way communication in video calls *without interpreters or external tools", breaking the accessibility barrier instantly.

How we built it

We built DuckSpeak as a cross-platform Progressive Web App (PWA) using React + Next.js, so users launch it like a native app, through their web browser, no installation required.

Tech stack highlights:

MediaPipe Hands – 84-point landmark tracking for both hands in real time.
Custom KNN Classifier – Recognizes ASL gestures with temporal smoothing.
LiveKit (WebRTC) – Sub-100ms low-latency video streaming.
Web Speech API – Real-time speech recognition and TTS.
IndexedDB + Transfer Learning – Auto-adapts the gesture model to each user’s unique signing style.

The system captures motion data with MediaPipe, structures it as temporal landmarks, interprets using an in-app model, and returns the result as text or voice, all within seconds.

Challenges we ran into

Dynamic ASL movement: Capturing gestures that rely on motion, not just static poses, required streaming 3D landmark data over time.
Real-time latency: Keeping ML inference under 100ms per frame while maintaining video quality.
Two-hand recognition: Handling asynchronous gestures and partial occlusions in live webcam feeds.
Video Streaming: Video feed kept being choppy, due to the feed being shared between the model and the UI, so we implemented a double-layer approach with the interpreted layer below the raw data layer, which was the layer we showed to the user.

Accomplishments that we’re proud of

Built a working ASL ↔ speech translator in just 36 hours.
Achieved real-time communication between deaf and hearing users without interpreters.
Designed a browser-based solution that anyone can access instantly.
Created a seamless demo experience, “hello” in sign → instant caption → instant speech output.
Most importantly, showed how ML can enhance human connection, not replace it.

What we learned

Learned to optimize MediaPipe pipelines for dual-hand tracking without frame drops.
Mastered WebRTC integration and speech APIs for synchronous communication.
Discovered how GPT-4o’s multimodal reasoning can bridge structured sensor data and natural language.
Grew as collaborators, from ML beginners to building a full-stack, production-level prototype.

In short: we learned how to use AI to expand human ability, not by replacing communication, but by augmenting it.

What’s next for DuckSpeak

Expand ASL vocabulary coverage using larger datasets like MS-ASL and ASL Citizen.
Implement an AI learning model to learn from user input, with their consent.
Integrate emotion and facial expression detection for richer sign meaning.
Build a public API for other video platforms (Zoom, Teams, Discord).
Add multi-language support for other sign languages (BSL, ISL, LSF).

Our long-term vision:

Every video call platform, from classrooms to boardrooms, should include accessibility by default. DuckSpeak is how we get there.

Built With

livekit-for-real-time-video-calls
netlify
tensorflow.js-for-ml
typescript/react-frontend-with-mediapipe-hands-for-gesture-detection

Updates

Vishal Thamaraimanalan started this project — Oct 05, 2025 06:24 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.