Inspiration

We wanted to build a tool that empowers deaf and mute individuals to express themselves more naturally. Typing and sign language can be limiting in certain settings, so we set out to create a hands-free, voice-free communication system that works in real time — right in the browser.

What it does

Speak for Me AI is a web-based application that uses a webcam to detect lip movements and optionally audio, then translates them into live text. It requires no typing and runs directly in the browser, making it accessible and easy to use.

How we built it

Frontend: HTML, CSS, and JavaScript to create a clean, responsive UI

Lip and Audio Detection: Python with OpenCV and Dlib for facial and lip tracking, and optional speech recognition

Machine Learning: A custom-trained deep learning model to map lip movements to words

Integration: WebSockets used to send real-time data from the Python backend to the web frontend

Challenges we ran into

Ensuring accurate word recognition from lip movements without relying on sound

Training the model with limited, clean lip-reading datasets

Handling latency in real-time video processing

Syncing Python backend with JavaScript frontend smoothly using WebSockets

Accomplishments that we're proud of

Working cross-platform web app with only a webcam required

Successfully merged lip reading with speech recognition for flexibility

Created a tool that could have real-world impact on accessibility

What we learned

Building efficient video pipelines for browser and backend

Training and testing lip-reading models

The value of accessibility-focused design

Real-time communication techniques using WebSockets

What's next for Speak for Me Ai

Add language translation for international communication

Optimize for mobile devices and low-end hardware

Build a standalone mobile/web app with offline mode

Explore integration with wearables and AR for hands-free captions

Built With

Share this project:

Updates