Inspiration

Watching friends struggle to communicate with the deaf community sparked our imagination: enabling everyone to have a voice.

Interpreters cost $150/hour and aren't always available. Existing tech is clunky and slow. We envisioned AI-powered glasses that give sign language users their voice and hearing users visual understanding—breaking down barriers between communities.

What We Built

Sign2Speak features:

  • Dual-camera ASL recognition with wide-angle capture for comprehensive gesture detection
  • Cloud-based computer vision pipeline using Gemini's models for real-time sign language interpretation
  • ElevenLabs speech synthesis for natural voice output from sign language input
  • AR text overlay for speech-to-text display in user's field of vision
  • Bidirectional communication flow enabling seamless deaf-hearing conversations

Technical Architecture

  • Frontend: Smart glasses interface
  • Computer Vision: Dual-camera setup with cloud-based Gemini model processing
  • Speech Synthesis: ElevenLabs API for natural voice generation
  • Speech Recognition: Built-in microphone with real-time transcription
  • AR Display: Text overlay rendering for speech-to-text output

What We Learned

  • LLM sign language recognition: Modern LLMs accurately understand sign language with dual-camera input, enabling precise gesture interpretation
  • Real-time video processing: While latency remains challenging, recent real-time video APIs show promising sub-200ms processing feasibility
  • Training data limitations: Limited sign language datasets require significant additional data collection for scalability
  • Hardware constraints: Balancing processing power, battery life, and wearable form factors

Challenges We Overcame

  1. Real-time processing limitations: Engineered custom pipeline using Gemini's vision models with optimized preprocessing
  2. Dual-camera synchronization: Implemented frame-perfect alignment for accurate 3D gesture reconstruction—single cameras provided poor results due to limited field of view
  3. Context preservation: Developed conversation state management to maintain dialogue flow and reduce interpretation errors

Impact & Future Vision

Sign2Speak addresses a $40+ billion assistive technology market while fostering inclusive communication. Our roadmap includes expanding to international sign languages, improving offline processing, and ensuring real-time translations.

We believe everyone deserves to have a voice and that's our ultimate mission.

Built With

Share this project:

Updates