DuckSpeak

DuckSpeak bridges the communication gap between the Deaf and hearing communities with a fully browser-based, privacy-first interface.

Inspiration

Every day, 70 million deaf people face communication barriers in education, workplaces, and daily life. Only about 1% of the world knows sign language, meaning simple interactions often require an interpreter or are impossible altogether.

We asked ourselves: “What if technology could merge human senses, making visual language audible, and speech visible?”

That question inspired DuckSpeak, a bridge between the deaf and hearing worlds that truly embodies the Human Augmentation track: enhancing human ability and inclusivity through AI.

What it does

DuckSpeak enables real-time bidirectional translation between American Sign Language (ASL) and spoken English during live video calls.

  • A deaf user signs → the app detects and translates it into speech and captions for hearing participants.
  • A hearing user speaks → the app generates captions in real time.

It’s the first platform that allows two-way communication in video calls *without interpreters or external tools", breaking the accessibility barrier instantly.

How we built it

We built DuckSpeak as a cross-platform Progressive Web App (PWA) using React + Next.js, so users launch it like a native app, through their web browser, no installation required.

Tech stack highlights:

  • MediaPipe Hands – 84-point landmark tracking for both hands in real time.
  • Custom KNN Classifier – Recognizes ASL gestures with temporal smoothing.
  • LiveKit (WebRTC) – Sub-100ms low-latency video streaming.
  • Web Speech API – Real-time speech recognition and TTS.
  • IndexedDB + Transfer Learning – Auto-adapts the gesture model to each user’s unique signing style.

The system captures motion data with MediaPipe, structures it as temporal landmarks, interprets using an in-app model, and returns the result as text or voice, all within seconds.

Challenges we ran into

  • Dynamic ASL movement: Capturing gestures that rely on motion, not just static poses, required streaming 3D landmark data over time.
  • Real-time latency: Keeping ML inference under 100ms per frame while maintaining video quality.
  • Two-hand recognition: Handling asynchronous gestures and partial occlusions in live webcam feeds.
  • Video Streaming: Video feed kept being choppy, due to the feed being shared between the model and the UI, so we implemented a double-layer approach with the interpreted layer below the raw data layer, which was the layer we showed to the user.

Accomplishments that we’re proud of

  • Built a working ASL ↔ speech translator in just 36 hours.
  • Achieved real-time communication between deaf and hearing users without interpreters.
  • Designed a browser-based solution that anyone can access instantly.
  • Created a seamless demo experience, “hello” in sign → instant caption → instant speech output.
  • Most importantly, showed how ML can enhance human connection, not replace it.

What we learned

  • Learned to optimize MediaPipe pipelines for dual-hand tracking without frame drops.
  • Mastered WebRTC integration and speech APIs for synchronous communication.
  • Discovered how GPT-4o’s multimodal reasoning can bridge structured sensor data and natural language.
  • Grew as collaborators, from ML beginners to building a full-stack, production-level prototype.

In short: we learned how to use AI to expand human ability, not by replacing communication, but by augmenting it.

What’s next for DuckSpeak

  • Expand ASL vocabulary coverage using larger datasets like MS-ASL and ASL Citizen.
  • Implement an AI learning model to learn from user input, with their consent.
  • Integrate emotion and facial expression detection for richer sign meaning.
  • Build a public API for other video platforms (Zoom, Teams, Discord).
  • Add multi-language support for other sign languages (BSL, ISL, LSF).

Our long-term vision:

Every video call platform, from classrooms to boardrooms, should include accessibility by default. DuckSpeak is how we get there.

Built With

  • livekit-for-real-time-video-calls
  • netlify
  • tensorflow.js-for-ml
  • typescript/react-frontend-with-mediapipe-hands-for-gesture-detection
Share this project:

Updates