Omoi: Unlocking the Silent World

Inspiration

40% of autistic children never develop functional speech. 1 in 36 US children have autism. Yet existing AAC devices cost $300-$10,000 and don't adapt to users.

We saw a mother at a grocery store, unable to understand her non-verbal son's distress. He was crying not from pain, but from the frustration of having no voice.

Communication is a human right. That's why we built Omoi.

What it does

Omoi is a free AI-powered AAC platform that works in 3 steps:

  1. Record - Caregiver records their voice for personalized synthesis
  2. Select - User taps icons in any order to build sentences
  3. Speak - AI constructs the sentence and speaks it in the caregiver's voice with appropriate emotion

Unlike traditional AAC systems that rely on rigid menus, Omoi uses AI to predict intent, recognize gestures through computer vision, and detect emotions in real-time.

How we built it

Tech Stack:

  • Google Cloud Platform for backend
  • Gemini AI for natural language prediction
  • CNN models for gesture recognition
  • Custom emotion detection trained on RAVDESS dataset
  • Eye-tracking integration using WebGazer.js
  • Support Vector Machines for emotion classification

Key Innovation: We're building the world's first dataset of non-verbal autistic communication gestures from platform usage (with consent). Every interaction makes the system smarter - more users means more data, better models, and more accurate predictions.

Architecture:

  • React frontend with offline-first design
  • Real-time voice synthesis using transfer learning
  • Computer vision pipeline processing at 30fps
  • Hybrid approach: lightweight processing on-device, complex AI in cloud

Challenges we ran into

Eye-tracking computational demands: Required massive processing power. Initial implementation only hit 15fps and drained batteries in 20 minutes. Solved with hybrid approach, model quantization (75% size reduction), and frame skipping.

Voice quality from limited data: Commercial systems need hours of audio; we had 10-15 minutes. Used data augmentation and transfer learning to improve quality from 2.1/5 to 4.2/5.

Gesture recognition accuracy: Started at 67% due to false positives from involuntary movements (stimming). Implemented multi-frame verification, confidence thresholding, and personalized calibration to reach 94% accuracy.

Platform-scale data collection: Building proprietary gesture dataset required solving privacy (COPPA compliance), data quality, and annotation pipeline challenges while maintaining user trust.

Accomplishments that we're proud of

  • Completely free when competitors charge $300-$10,000
  • 94% gesture recognition accuracy matching expensive specialized hardware
  • 340% increase in communication attempts during beta testing (23 users, 3 schools)
  • One 8-year-old constructed their first multi-word sentence after 3 years of single words
  • World's first dataset of real non-verbal autistic communication patterns - our competitive moat
  • Sub-100ms prediction latency for natural conversation

What we learned

Edge AI is hard: Model quantization, pruning, and knowledge distillation are essential. 80% of processing time was memory allocation, not computation.

Voice synthesis is complex: Intelligible speech ≠ natural speech. Prosody, phoneme timing, and emotional modulation took months to get right.

Real users are unpredictable: Lab accuracy doesn't translate to real-world performance. Lighting, background noise, and device diversity required adaptive solutions.

Communication is prediction: Traditional AAC treats it like navigating menus. We realized it's an AI prediction problem - and built accordingly.

Most important: We learned from speech therapists, special educators, autism advocates, and families. Technology alone isn't the solution - empathy-driven design is.

What's next

Near-term (6 months):

  • Native iOS/Android apps
  • Personalized communication analytics dashboard
  • Multi-caregiver voice support
  • Reduce training audio from 15 to 5 minutes

Long-term vision:

  • Neuralink integration for direct thought-to-speech
  • MND Association partnership for progressive conditions like ALS
  • Voice preservation for users before speech loss
  • Cross-platform expansion (smart home, wearables)

Goal: 100,000 users by 2026 through school partnerships, multi-language support, and open-source developer toolkit.

Our mission: Communication for everyone, everywhere. Making voice a fundamental right, not a luxury.


Built With

google-cloud gemini-ai tensorflow python javascript react cnn machine-learning computer-vision svm eye-tracking accessibility aac emotion-detection


Unlocking the silent world, one voice at a time. 💙

Built With

+ 1 more
Share this project:

Updates