Inspiration

Hundreds of millions of people worldwide live with arthritis, motor disabilities, repetitive strain injuries, or amputations. For them, using a traditional keyboard and mouse ranges from painful to entirely impossible.

When I looked at existing accessibility tools, I found a broken market: solutions are either prohibitively expensive (like $2,000+ eye-tracking hardware), incredibly clunky, or require invasive cloud subscriptions that compromise privacy.

I realized that almost every modern laptop already has the exact hardware needed to solve this problem: a webcam and a microphone. I built AccessAI: Gesture OS — a completely free, intelligent, and 100% offline software matrix that gives users complete control over their computer using natural body movement.


What It Does

AccessAI transforms your standard webcam and microphone into a full operating system controller.

  • Vision Matrix — Using a custom-tuned hand-tracking engine, users move their cursor by pointing, left/right click using natural finger extensions, and scroll web pages by pulling a fist up and down.
  • The Clutch — To prevent false clicks when a user just wants to rest their hand, I engineered a physical clutch gesture that instantly puts OS tracking to sleep while keeping the system alive.
  • Offline Dictation — By raising their pinky finger, users toggle a local AI transcription engine. They speak naturally, and the system types their words into any application on their computer.
  • Zero-Latency Dashboard — Everything is monitored via a premium glassmorphic React dashboard featuring real-time state tracking, live text transcription, dynamic SVG gesture cheat sheets, and custom sensitivity tuning.

How I Built It

I architected AccessAI as a robust, full-stack system with three distinct layers:

The Engine (Backend) Written in Python using FastAPI and Uvicorn. I used OpenCV and MediaPipe's Hand Landmarker model for 30FPS computer vision. For voice, I implemented Faster-Whisper running purely on int8 CPU inference. OS-level control was achieved via PyAutoGUI, with cross-platform notifications via Plyer.

The Interface (Frontend) A sleek React.js application styled with TailwindCSS and Framer Motion. I built a custom Web Audio API synthesizer directly into the frontend to generate zero-asset audio feedback without relying on any external files.

The Bridge A bi-directional WebSocket connection links the two layers. The Python engine streams live video frames and JSON state updates to React, while React beams dynamic settings — like cursor smoothing algorithms and tracking margins — back to Python in real time.


Challenges I Ran Into

Building a multimodal, asynchronous AI system natively on a laptop CPU brought serious engineering hurdles.

Integration Hell I had to bridge an async FastAPI WebSocket server, a synchronous OpenCV video loop, and a heavy local LLM (Whisper) running in a background thread — all simultaneously. Preventing the video feed from freezing while the AI transcribed audio required meticulous thread management and async sleep cycles.

Jitter & Mathematical Smoothing Initially, mouse movement was incredibly snappy and unusable. I wrote a custom Exponential Moving Average (EMA) algorithm combined with dynamic tracking margins to absorb the micro-jitters of a human hand — resulting in a smooth, controllable cursor.

Anatomical Edge Cases Simple logic for detecting folded fingers failed depending on a user's hand flexibility. I rewrote the spatial detection logic to check relative anatomical landmarks — for example, ensuring the pinky tip is physically higher than its own knuckle — to guarantee robust, consistent gesture triggers across different hands.


Accomplishments I'm Proud Of

  • Zero to One — My First Hackathon. I came in with absolutely zero hackathon experience and shipped a complete, full-stack, AI-driven project entirely solo within the time limit. That means something to me.
  • 100% Offline & Privacy First. Zero user data, voice recordings, or camera feeds are ever sent to a cloud API. It runs securely, privately, and costs $0 to operate.
  • Graceful Hardware Degradation. If a user unplugs their webcam mid-session, the backend catches the stream failure, safely pauses all threads, and triggers a custom Hardware Disconnected error UI in the dashboard — rather than crashing the app.
  • The Dynamic SVG Engine. Instead of static PNG images for the gesture tutorial, I built a React component that programmatically draws and animates a wireframe hand from JSON coordinate configurations — every gesture is rendered live in code.

What I Learned

Building AccessAI taught me that accessibility tools are not just about writing code — they demand extreme empathy for the user. I learned how to manipulate raw Web Audio Contexts for non-intrusive auditory feedback, mastered the complexities of Python threading and WebSocket architecture, and discovered how critical mathematical smoothing is when translating chaotic real-world physical movement into precise 2D pixel coordinates.


What's Next for AccessAI: Gesture OS

  1. Custom Macro Mapping — A UI interface letting users record and map their own gestures to complex OS macros, like opening specific apps or running terminal scripts.
  2. Gaze Tracking — Integrating eye-tracking alongside hand-tracking so users can move the cursor with their eyes and use their hands strictly for clicking and scrolling.
  3. Native Desktop Packaging — Wrapping the Python server and React frontend into a single executable app using Electron or PyInstaller, so non-technical users can install it with one click — no terminal required.

Built With

  • fastapi
  • faster-whisper
  • framer-motion
  • mediapipe
  • opencv
  • plyer
  • pyautogui
  • python
  • react.js
  • tailwind.css
  • uvicorn
  • web-audio-api
  • websockets
Share this project:

Updates