Inspiration
Watching friends struggle to communicate with the deaf community sparked our imagination: enabling everyone to have a voice.
Interpreters cost $150/hour and aren't always available. Existing tech is clunky and slow. We envisioned AI-powered glasses that give sign language users their voice and hearing users visual understanding—breaking down barriers between communities.
What We Built
Sign2Speak features:
- Dual-camera ASL recognition with wide-angle capture for comprehensive gesture detection
- Cloud-based computer vision pipeline using Gemini's models for real-time sign language interpretation
- ElevenLabs speech synthesis for natural voice output from sign language input
- AR text overlay for speech-to-text display in user's field of vision
- Bidirectional communication flow enabling seamless deaf-hearing conversations
Technical Architecture
- Frontend: Smart glasses interface
- Computer Vision: Dual-camera setup with cloud-based Gemini model processing
- Speech Synthesis: ElevenLabs API for natural voice generation
- Speech Recognition: Built-in microphone with real-time transcription
- AR Display: Text overlay rendering for speech-to-text output
What We Learned
- LLM sign language recognition: Modern LLMs accurately understand sign language with dual-camera input, enabling precise gesture interpretation
- Real-time video processing: While latency remains challenging, recent real-time video APIs show promising sub-200ms processing feasibility
- Training data limitations: Limited sign language datasets require significant additional data collection for scalability
- Hardware constraints: Balancing processing power, battery life, and wearable form factors
Challenges We Overcame
- Real-time processing limitations: Engineered custom pipeline using Gemini's vision models with optimized preprocessing
- Dual-camera synchronization: Implemented frame-perfect alignment for accurate 3D gesture reconstruction—single cameras provided poor results due to limited field of view
- Context preservation: Developed conversation state management to maintain dialogue flow and reduce interpretation errors
Impact & Future Vision
Sign2Speak addresses a $40+ billion assistive technology market while fostering inclusive communication. Our roadmap includes expanding to international sign languages, improving offline processing, and ensuring real-time translations.
We believe everyone deserves to have a voice and that's our ultimate mission.
Built With
- computervision
- eleven-multilingual-v2
- fastapi
- gemini-2.5-pro
- htps
- llm
- node.js
- python
- react
- typescript
- vr

Log in or sign up for Devpost to join the conversation.