Inspiration
We wanted to build a tool that empowers deaf and mute individuals to express themselves more naturally. Typing and sign language can be limiting in certain settings, so we set out to create a hands-free, voice-free communication system that works in real time — right in the browser.
What it does
Speak for Me AI is a web-based application that uses a webcam to detect lip movements and optionally audio, then translates them into live text. It requires no typing and runs directly in the browser, making it accessible and easy to use.
How we built it
Frontend: HTML, CSS, and JavaScript to create a clean, responsive UI
Lip and Audio Detection: Python with OpenCV and Dlib for facial and lip tracking, and optional speech recognition
Machine Learning: A custom-trained deep learning model to map lip movements to words
Integration: WebSockets used to send real-time data from the Python backend to the web frontend
Challenges we ran into
Ensuring accurate word recognition from lip movements without relying on sound
Training the model with limited, clean lip-reading datasets
Handling latency in real-time video processing
Syncing Python backend with JavaScript frontend smoothly using WebSockets
Accomplishments that we're proud of
Working cross-platform web app with only a webcam required
Successfully merged lip reading with speech recognition for flexibility
Created a tool that could have real-world impact on accessibility
What we learned
Building efficient video pipelines for browser and backend
Training and testing lip-reading models
The value of accessibility-focused design
Real-time communication techniques using WebSockets
What's next for Speak for Me Ai
Add language translation for international communication
Optimize for mobile devices and low-end hardware
Build a standalone mobile/web app with offline mode
Explore integration with wearables and AR for hands-free captions
Built With
- cv2
- html5
- javascript
- openai
- python
Log in or sign up for Devpost to join the conversation.