VisionKey

Perform in tech just with your eyes

Inspiration

Typing shouldn’t require hands. We wanted a zero-hardware, fully browser-based way for anyone to write, click, and navigate—using just their eyes and subtle head motion.

What it does

VisionKey turns your webcam into a gaze/blink controller:

Moves a soft “cursor” where you look.

Snaps to nearby keys and word suggestions for accuracy.

Selects with a dwell or intentional blink.

Types on an on-screen keyboard with lightweight next-word predictions.

Calibrates in under a minute and runs entirely on-device in the browser.

How we built it

Tracking: Google MediaPipe Face Landmarks detection.

Fusion: Eye vectors + head pose blended, then smoothed with a One-Euro filter.

UI: Magnetized keyboard, dwell ring, and a prediction bar. Small WebAudio beeps confirm actions.

Calibration: Quadratic mapping from feature space → screen coordinates; saved locally for instant reuse.

Safety: Blink hysteresis, cooldowns, and distance checks to avoid accidental clicks.

Challenges we ran into

CORS/hosting: Loading models reliably across dev servers and HTTPS.

Backend variance: WebGPU/WebGL/WASM capability differs by device; needed robust fallback.

Blink robustness: Preventing cursor jumps during partial blinks and handling false positives.

UX tuning: Balancing magnet strength, dwell timing, and prediction placement so it feels “sticky” but not frustrating.

Accomplishments that we're proud of

A fully local, no-install eye keyboard that runs in a tab.

Smooth, low-latency gaze with accidental-click prevention (snap radius + distance gate + cooldown).

Quick calibration that meaningfully improves accuracy across users and lighting.

What we learned

Small UX details (snap/“magnet,” dwell stability windows, audio ticks) matter more than raw ML accuracy.

Browser ML is viable for assistive tech if you design for fallbacks and graceful degradation.

Calibration mapping > generic heuristics—personalization beats extra model complexity.

What’s next for VisionKey

Personal language model for stronger predictions and corrections.

Adaptive calibration that updates passively while you type.

Symbols/emoji & navigation layer (scroll, drag, select, copy/paste).

PWA / desktop wrapper for kiosk and offline use.

Accessibility studies with diverse users to refine thresholds and ergonomics.

Built With

Updates

Harry Du started this project — Oct 05, 2025 07:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.