DuckSpeak
DuckSpeak bridges the communication gap between the Deaf and hearing communities with a fully browser-based, privacy-first interface.
Inspiration
Every day, 70 million deaf people face communication barriers in education, workplaces, and daily life. Only about 1% of the world knows sign language, meaning simple interactions often require an interpreter or are impossible altogether.
We asked ourselves: “What if technology could merge human senses, making visual language audible, and speech visible?”
That question inspired DuckSpeak, a bridge between the deaf and hearing worlds that truly embodies the Human Augmentation track: enhancing human ability and inclusivity through AI.
What it does
DuckSpeak enables real-time bidirectional translation between American Sign Language (ASL) and spoken English during live video calls.
- A deaf user signs → the app detects and translates it into speech and captions for hearing participants.
- A hearing user speaks → the app generates captions in real time.
It’s the first platform that allows two-way communication in video calls *without interpreters or external tools", breaking the accessibility barrier instantly.
How we built it
We built DuckSpeak as a cross-platform Progressive Web App (PWA) using React + Next.js, so users launch it like a native app, through their web browser, no installation required.
Tech stack highlights:
- MediaPipe Hands – 84-point landmark tracking for both hands in real time.
- Custom KNN Classifier – Recognizes ASL gestures with temporal smoothing.
- LiveKit (WebRTC) – Sub-100ms low-latency video streaming.
- Web Speech API – Real-time speech recognition and TTS.
- IndexedDB + Transfer Learning – Auto-adapts the gesture model to each user’s unique signing style.
The system captures motion data with MediaPipe, structures it as temporal landmarks, interprets using an in-app model, and returns the result as text or voice, all within seconds.
Challenges we ran into
- Dynamic ASL movement: Capturing gestures that rely on motion, not just static poses, required streaming 3D landmark data over time.
- Real-time latency: Keeping ML inference under 100ms per frame while maintaining video quality.
- Two-hand recognition: Handling asynchronous gestures and partial occlusions in live webcam feeds.
- Video Streaming: Video feed kept being choppy, due to the feed being shared between the model and the UI, so we implemented a double-layer approach with the interpreted layer below the raw data layer, which was the layer we showed to the user.
Accomplishments that we’re proud of
- Built a working ASL ↔ speech translator in just 36 hours.
- Achieved real-time communication between deaf and hearing users without interpreters.
- Designed a browser-based solution that anyone can access instantly.
- Created a seamless demo experience, “hello” in sign → instant caption → instant speech output.
- Most importantly, showed how ML can enhance human connection, not replace it.
What we learned
- Learned to optimize MediaPipe pipelines for dual-hand tracking without frame drops.
- Mastered WebRTC integration and speech APIs for synchronous communication.
- Discovered how GPT-4o’s multimodal reasoning can bridge structured sensor data and natural language.
- Grew as collaborators, from ML beginners to building a full-stack, production-level prototype.
In short: we learned how to use AI to expand human ability, not by replacing communication, but by augmenting it.
What’s next for DuckSpeak
- Expand ASL vocabulary coverage using larger datasets like MS-ASL and ASL Citizen.
- Implement an AI learning model to learn from user input, with their consent.
- Integrate emotion and facial expression detection for richer sign meaning.
- Build a public API for other video platforms (Zoom, Teams, Discord).
- Add multi-language support for other sign languages (BSL, ISL, LSF).
Our long-term vision:
Every video call platform, from classrooms to boardrooms, should include accessibility by default. DuckSpeak is how we get there.
Built With
- livekit-for-real-time-video-calls
- netlify
- tensorflow.js-for-ml
- typescript/react-frontend-with-mediapipe-hands-for-gesture-detection

Log in or sign up for Devpost to join the conversation.