AssistMe

Inspiration

Communication is a fundamental human right. Yet, for millions of people living with ALS, paralysis, or "locked-in" syndrome, expressing a simple need like "I am thirsty" is a daily struggle. Traditional eye-tracking solutions like Tobii Dynavox are prohibitively expensive (often $15,000+), require bulky hardware, and rely on static, limited vocabularies.

We asked ourselves: Why can't we use the camera already built into every laptop to give people their voice back for free?

Inspired by the potential of multimodal AI, we built AssistMe to democratize accessibility. We wanted to create a tool that isn't just a digital board, but an intelligent companion that anticipates what a user wants to say.

What it does

AssistMe is an AI-powered, eye-controlled communication platform. It transforms a standard webcam into a high-precision eye tracker, allowing users to navigate a digital interface using only their eyes.

Eye-Gaze Navigation: Users look at large, high-contrast buttons to select options. A "dwell" timer triggers the selection automatically—no mouse or keyboard required.
AI-Driven Dynamic Menus: Unlike static communication boards, AssistMe uses Google Gemini to generate context-aware options in real-time. If a user selects "Food," the AI doesn't just show a generic list; it generates specific, varied meals like "Sushi," "Tacos," or "Burrito Bowl," creating an infinite, adaptive vocabulary.
Instant Vocalization: Selected phrases are immediately spoken aloud using the browser's native Text-to-Speech engine.
Caregiver Alerts: Critical options like "Call Caregiver" instantly trigger backend notifications to alert family members or medical staff.

How we built it

We engineered a robust full-stack architecture optimized for low-latency computer vision and real-time AI generation.

Frontend (The Eye Engine): We built the client using Next.js 14 and React. The core eye-tracking engine utilizes MediaPipe Face Mesh, processing 468 3D facial landmarks locally in the browser at 60 FPS. This ensures privacy (video never leaves the device) and responsiveness.
Backend (The Logic): We used Express.js for our server-side logic. It handles API requests, manages the connection to our PostgreSQL database (via Drizzle ORM), and acts as the secure gateway for our AI interactions.
Artificial Intelligence: We integrated Google Gemini (Flash 1.5/2.5) to power the dynamic menu system. The Express server constructs prompt contexts based on the user's previous selections and sends them to Gemini, which returns structured JSON data to render the next set of UI buttons.
Authentication: We implemented Clerk for secure, passwordless authentication, allowing users to save their calibration profiles and settings across devices.

Challenges we ran into

The "Webcam Jitter" Problem: Standard webcams are noisy. A user's eye might be still, but the raw sensor data fluctuates, causing the cursor to vibrate uncontrollably. We solved this by engineering a custom 2-Stage Exponential Moving Average (EMA) smoothing algorithm paired with a Hysteresis Deadzone. This makes the cursor feel fluid when moving but "locks" it in place when the user focuses on a button.
Corner Reachability: We discovered that users physically struggled to look at the extreme corners of their screens while being tracked. We developed a linear Coordinate Expansion Algorithm (Sensitivity Gain) that mathematically stretches the gaze mapping. This allows users to reach 100% of the screen width while only moving their eyes 80% of the way, significantly reducing eye strain.
AI Latency: Generating options in real-time can be slow. We optimized our Gemini prompts to return strict JSON and utilized the "Flash" model variants to ensure the interface remains snappy and responsive.

Accomplishments that we're proud of

Hardware-Free Accessibility: We successfully built a high-precision eye tracker that runs entirely in a web browser without any external sensors.
Intelligent Context: We moved beyond static "Yes/No" boards. Our system understands context—offering "Pain medication" if the user selects "Medical," or specific dinner options if they select "Food."
Engineering Robustness: We are particularly proud of the mathematical solutions we implemented (IDW Interpolation, EMA Smoothing) to turn noisy webcam data into a reliable input method.

What we learned

Accessibility is Math: We learned that making a tool "accessible" isn't just about big buttons; it's about the subtle math behind cursor smoothing and calibration that makes the experience frustration-free.
Local-First is Powerful: Processing computer vision on the client side (React) while keeping business logic on the server (Express) gave us the perfect balance of privacy and performance.

What's next for AssistMe

IoT Integration: We plan to connect AssistMe to Home Assistant, allowing users to turn on lights, adjust thermostats, or lock doors using only their eyes.
Predictive Sentence Building: We want to use Gemini to autocomplete full sentences based on the user's historical usage patterns, speeding up communication speed.
Emotion Detection: Using MediaPipe to detect user frustration or fatigue and automatically suggest "I need a break" or adjust the interface sensitivity.