Unbound

A mouse that adapts to your body, instead of the other way around.

Inspiration

Roughly 5.4 million people in the United States live with some form of paralysis. For many of them, using a computer means relying on specialized assistive hardware — eye trackers that cost $500 to $3,000, sip-and-puff devices, head-mounted IR pointers — tools that are expensive, often clinical, and almost always built around a single assumption about what the user can do.

We kept coming back to a simple question:

Why does accessibility cost more than the computer it's trying to access?

Every existing solution we looked at had a rigid input requirement. Eye trackers assume precise gaze control. Sip-and-puff assumes breath control. Switch-based systems assume reliable contact with a single muscle group. None of them ask the user what they can do — they tell the user what they have to do.

We wanted to build the opposite: a tool that meets the user where they are. If they can wink, great. If they can't, they can use their mouth. If they can't do that, an eyebrow raise. If their range of motion is small, the system calibrates around it instead of demanding more.

That's Unbound.

What it does

Unbound turns any laptop's webcam into a hands-free mouse.

Your nose tip is the joystick — moving your head shifts the cursor across the screen.
Facial gestures trigger mouse actions: left click, right click, double click, scroll, drag, and pause.
Every gesture is remappable. Wink-for-click is the default, but if you can't wink, you can map clicks to mouth-open, smile, eyebrow raise, cheek puff, or pucker.
Per-user calibration measures your range of motion and sets thresholds based on what you can actually do.
Runs 100% locally — no cloud, no account, no internet required. Camera frames never leave the device.
Ships as a single Windows .exe — double-click and run. No Python install. No setup wizard.

How we built it

Tech stack

Layer	Tool
Language	Python 3.11
Face landmarks	MediaPipe Tasks API
Computer vision	OpenCV
Mouse control	PyAutoGUI
Settings backend	FastAPI
Native window	PyWebView
Packaging	PyInstaller

We deliberately used the new mediapipe.tasks API instead of the deprecated mediapipe.solutions — the new API exposes blendshapes, which are pre-normalized scores from 0.0 to 1.0 for facial expressions like eyeBlinkLeft, jawOpen, and browInnerUp. That made customizable gesture detection feasible inside a 24-hour build window.

The cursor controller

The nose-to-cursor mapping uses a joystick model rather than absolute mapping, because joystick mode works for users with limited head mobility. The cursor's velocity each frame is the offset between the user's current nose position and their calibrated neutral resting position, scaled by a sensitivity constant the user controls.

To kill jitter from involuntary movements, we apply an exponential moving average filter — each frame's cursor velocity is a weighted blend of the new measurement and the previous smoothed value. The smoothing weight is exposed as a slider in the UI: lower values mean smoother but laggier movement, higher values mean snappier but jitterier. The user picks their tradeoff.

We also enforce a deadzone around the neutral position. If the user's head is within a small radius of their resting position, the cursor doesn't move at all. This prevents tiny involuntary movements from drifting the cursor across the screen when the user is trying to hold still.

Gesture detection with hysteresis

The naive approach to gesture detection is "if blendshape score is above 0.5, fire a click." That fails immediately, because real-world signals wobble around any single threshold and you end up firing 5 clicks when the user meant 1.

We used a two-threshold hysteresis state machine with a higher trigger threshold and a lower release threshold:

The gesture starts in an IDLE state.
When the score crosses the trigger threshold going up, the gesture transitions to ACTIVE and fires a GESTURE_START event.
It stays ACTIVE until the score drops below the release threshold, at which point it transitions back to IDLE and fires a GESTURE_END.

The gap between the two thresholds eliminates flicker entirely. Combined with a tap-vs-hold timer, it lets us distinguish a quick click from a press-and-hold drag.

Per-user calibration

This is the part we're proudest of. The user does two calibration steps:

Neutral capture — sit normally for 3 seconds. We average the nose position and all blendshape scores to establish a personal baseline.
Per-gesture capture — perform each gesture for 2 seconds. We record the peak blendshape score the user can actually reach for that gesture, then set the trigger threshold at 60% of that peak and the release threshold at 30%.

This is the difference between "use a wink" (most apps) and "use your wink." A user with facial asymmetry might only reach a peak score of 0.4 on one eye, where the default threshold of 0.5 would never fire. Calibration fixes that automatically.

The desktop app shell

The settings panel is HTML/CSS/JS served by FastAPI on 127.0.0.1, wrapped in a native window with PyWebView. The face tracker runs on a background thread; the FastAPI server runs on another; PyWebView owns the main thread. They share a locked application state object for cross-thread communication.

Then everything — Python interpreter, dependencies, the MediaPipe model file, the UI assets — gets bundled with PyInstaller into one Unbound.exe that anyone can run by double-clicking.

Challenges we ran into

1. The MediaPipe API switch. Most tutorials and Stack Overflow answers still use the legacy mediapipe.solutions.face_mesh API. The new mediapipe.tasks API has a different lifecycle, requires manually downloading a .task model file, and uses timestamp-based video mode. Getting the timestamp arithmetic right (milliseconds, monotonic clock) took longer than we expected.

2. PyAutoGUI's failsafe. PyAutoGUI's default behavior is to abort the program if the cursor hits a screen corner — a sensible default for scripts, a disaster for an accessibility tool that intentionally moves the cursor everywhere. We had to make this a user setting and warn explicitly when it's disabled.

3. PyInstaller and bundled asset paths. Bundling a multi-file project into a single .exe breaks every relative path in your code. PyInstaller unpacks bundled assets into a temp directory at runtime, so open("ui/index.html") works in dev and silently dies in production. We wrote a path-resolution helper that all asset access goes through, and it took several rebuild cycles to track down every hidden import that PyInstaller's static analysis missed (mostly uvicorn submodules).

4. The threading model. Three concurrent things needed to coexist: the OpenCV camera loop, the FastAPI/uvicorn server, and the PyWebView native window. PyWebView demands the main thread on Windows. OpenCV prefers it. uvicorn doesn't care. Getting the right thread to own the right responsibility, with clean shutdown when the user closes the window, took real iteration.

5. Distinguishing winks from blinks. A wink should fire eyeBlinkLeft high and eyeBlinkRight low. A blink fires both. Without that asymmetry check, every blink fired a click. The fix was a per-gesture validator function on top of the raw blendshape score, not just a threshold.

What we learned

Hysteresis is everywhere in real-world signal processing. Any time you turn a continuous signal into a discrete event, you need two thresholds, not one. We'll reach for this pattern again.
Accessibility design forces better engineering. "What if the user can only do half of what we expect?" is a question that produces better software for everyone, not just for the disabled user.
The new MediaPipe Tasks API is genuinely better than solutions — but the documentation gap is real. We hope this project nudges that ecosystem.
Local-first isn't just an ethical position; it's a product feature. Privacy is an accessibility issue, especially for users whose disability already exposes them to surveillance.
Single-file .exe distribution matters for demos and for users. A tool a person has to install Python to use is a tool that disabled users — many of whom rely on caretakers for setup — won't have.

What's next

Voice as a second input modality. Pair face tracking with offline speech recognition (Whisper) so users with facial paralysis but full speech can fall back to voice for clicks.
Per-application gesture profiles. Different gesture-to-action mappings depending on which window has focus.
Snap-to-target. Detect clickable UI elements via Windows UI Automation and gently snap the cursor to them — the same trick high-end eye trackers use to compensate for imprecision.
User testing. We want to partner with USF's accessibility services to put Unbound in front of users who actually need it. Everything we've built is a hypothesis until then.

Unbound — your computer, free of the assumption that you have to use it like everyone else.

Built With

css
fastapi
html
javascript
mediapipe
numpy
opencv
pyautogui
pydantic
python
pywebview
react
uvicorn

Updates

Andres Dominguez started this project — Apr 26, 2026 12:16 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.