Inspiration

Millions of people with motor impairments (ALS, stroke, spinal cord injuries, temporary paralysis) lose access to a keyboard or speaking, the primary way we all communicate. We wanted a zero-cost, camera-only tool that lets anyone type and speak using just eye movements and blinks, no special hardware required.

What it does

  • Shows a large on-screen keyboard that auto-scans through keys.
  • You look left/right to choose a key group, then blink to select a letter.
  • Typed text appears on a “Board” area for easy reading.
  • Audio feedback confirms selections using ElevenLabs TTS (e.g., says the letter or “left/right”).
  • Works in real time with a standard webcam and runs fully on-device.

How we built it

  • Python + OpenCV for the camera pipeline and UI rendering.
  • dlib 68-point face landmarks to locate eye contours.
  • Custom blink detection (horizontal/vertical eye aspect ratio) and gaze ratio (iris segmentation + thresholding) to infer left/center/right glances.
  • A lightweight virtual keyboard rendered with OpenCV; timed scanning highlights the current key.
  • ElevenLabs v2 SDK for text-to-speech confirmations with byte-stream caching to keep it snappy.
  • Simple state machine for menus: select side → scan letters → blink to choose → speak/append to text.

Challenges we ran into

  • SDK changes: ElevenLabs’ modern client no longer exposes set_api_key/generate; we migrated to ElevenLabs(...).text_to_speech.convert and fixed audio playback imports.
  • Blink false positives from lighting and camera angles; tuned thresholds and added frame windows to stabilize.
  • macOS permissions for camera + Accessibility/Automation in later integrations.
  • Performance tradeoffs between detection robustness and frame rate on laptops.

Accomplishments that we're proud of

  • Fully hands-free typing + spoken feedback with only a webcam.
  • Robust blink detection that survives normal head motion.
  • Simple codebase that others can clone and run quickly.
  • Clear README and calibration tips to make it usable beyond the demo.

What we learned

  • Accessibility UX matters: bigger targets, consistent scan speed, audible confirmations, and forgiving thresholds dramatically reduce fatigue.
  • Vision heuristics (EAR/gaze ratios) can be surprisingly effective when tuned with good lighting.
  • Audio caching and short, distinct confirmations help maintain rhythm while typing by eye.

What's next for EyeTalk

  • iMessage (macOS) integration: direct iMessage send with AppleScript.
  • Quick phrases & word prediction to cut blinks per word.
  • Calibration panel (per-user blink threshold, scan speed, contrast theme).
  • Multilingual voices and offline fallbacks for limited connectivity.

Discord

  • Username: gotenks_123

Built With

Share this project:

Updates