Inspiration

The digital divide is often discussed in terms of internet access, but for over one billion people living with physical disabilities, the divide is biological. For individuals with quadriplegia, motor neuron diseases, or severe limb injuries, the standard mouse and keyboard are not just tools—they are barriers.

While investigating the assistive technology market, we were struck by a staggering disparity. Industry-leading eye-tracking and head-tracking hardware, such as Tobii, retail for between $2,000 and $10,000. These devices often require specialized infrared sensors and high-end processing units, making life-changing technology accessible to only the wealthiest 10% of those in need.

Our inspiration was to bridge this gap by proving that empathetic engineering can replace expensive hardware. We set out to build Hedmouse: a 100% free, hardware-agnostic, AI-powered ecosystem that grants total digital independence to anyone with a basic webcam. What it does

Hedmouse is a comprehensive assistive interface that translates facial geometry and vocal intent into system-level commands.

  1. Neural Motion Mapping

The system utilizes a nose-tip anchor point to track head orientation. By calculating the Euler angles of the head, the software maps physical rotation to screen coordinates with sub-pixel precision.

  1. Gesture-Driven Interactivity

To eliminate the need for physical buttons, we developed a gesture recognition engine:

Wink Detection: Differential analysis of eye aspect ratios allows for distinct left and right clicks.

Mouth Morphometry: Opening the mouth triggers a scrolling mode, allowing users to read long documents or browse social media hands-free.
  1. Bilingual Intelligent Assistant

Integrated directly into the interface is a voice-controlled agent capable of executing complex macro-commands (e.g., "Draft an email to my doctor" or "Open the YouTube app"). This assistant is fully optimized for both English and Arabic, addressing a massive void in the global assistive technology market for Middle Eastern users. How we built it

We engineered Hedmouse with a hybrid architecture to ensure cross-platform compatibility and high-performance inference. The Mobile Core (Kotlin & Google ML Kit)

For the mobile application, we utilized Google ML Kit's Face Detection API. ML Kit provided the low-latency classification needed for mobile environments. We specifically leveraged the FaceLandmark and Classification modes to detect "smiling" and "eye open" probabilities at 30+ frames per second. By using ML Kit, we ensured that the app remains lightweight enough for budget Android devices. The Desktop Engine (Python & MediaPipe)

On desktop, we required higher granularity. We used Google MediaPipe to extract 468 3D facial landmarks. By applying a transformation matrix to these landmarks, we could calculate the precise vector of the user's gaze and head pose. The Neural Assistant (Llama 3.1 & VOSK)

To power the intelligent commands, we integrated Llama 3.1 8B (quantized for edge performance). To ensure privacy, we used VOSK for offline speech-to-text. This allows the system to function without an internet connection, protecting sensitive user data. Mathematical Mapping

To ensure the cursor movement felt natural, we implemented a sigmoid-based acceleration curve. The relationship between head movement (\theta) and cursor displacement (d) is modeled as: d=1+e−k(θ−θ0​)L​

Where (L) is the maximum screen dimension, (k) is the sensitivity constant, and (\theta_0) is the neutral head position. This formula prevents jitter while allowing for rapid movement across the screen. Challenges we ran into The "Midas Touch" Problem

The most significant hurdle was distinguishing between involuntary biological actions and intentional commands. Early versions suffered from "accidental clicking" during natural blinking.

To solve this, we implemented a temporal thresholding algorithm. We calculated the Eye Aspect Ratio (EAR) using the formula: EAR=2∣∣p1​−p4​∣∣∣∣p2​−p6​∣∣+∣∣p3​−p5​∣∣​

By analyzing the EAR over a window of frames, we set a response threshold of 520ms. If the EAR stayed below a certain value for more than 520ms, it was classified as a click; otherwise, it was ignored as a natural blink. Resource Optimization

Running a large language model alongside real-time computer vision is taxing on CPU/GPU resources. We had to optimize our OpenCV image processing pipeline, converting frames to grayscale and resizing them before landmark detection to ensure the software could run on low-end laptops used in developing regions. Accomplishments that we're proud of

90%+ Tracking Accuracy: We achieved a level of precision that rivals hardware-based infrared trackers using nothing but a standard 720p webcam.

Arabic Language Integration: We built one of the first assistive tools fully optimized for the Arabic language, providing a voice-to-action pipeline for an underserved population.

Privacy by Design: By keeping the Llama 3.1 model and VOSK speech recognition 100% offline, we ensured that users' private movements and voices are never uploaded to the cloud.

The $0 Price Tag: We successfully neutralized the $10,000 barrier to entry for assistive technology.

What we learned

This project taught us that accessibility is not a hardware problem; it is a software and empathy problem. We learned how to balance complex mathematical models with user-centric design. We realized that for a person with a disability, a 500ms delay isn't just a lag—it is a barrier to communication. This drove us to optimize every line of code for maximum responsiveness. What's next for Hedmouse

Multi-Modal Eye-Gaze Fusion: We are currently working on integrating true eye-gaze tracking to supplement head movement, allowing for even smaller movements and higher precision.

Expanded LLM Capabilities: We plan to fine-tune our Llama model to understand more medical-specific commands, helping users communicate with caregivers more effectively.

NGO Partnerships: Our goal is to partner with international organizations like the Red Cross and various disability advocacy groups to pre-install Hedmouse on donated hardware worldwide.

iOS Implementation: Bringing the ML Kit and MediaPipe power to iPhone users to ensure no one is left behind regardless of their choice of operating system.

Built With

Share this project:

Updates