GazeBoard

Inspiration

Someone with ALS can think, feel, and understand everything happening around them. But they can't move. They can't speak. The only thing many of them can still control is where they look. Dedicated eye-gaze communication devices exist, but they cost $5,000 to $15,000, require months of insurance approval, and are bulky enough that they don't leave the wheelchair. We wanted to know: can a phone do this?

Microsoft Research's GazeSpeak project (CHI 2017) proved that 4-directional eye gestures combined with word prediction could outperform traditional communication boards. Google's SpeakFaster project (Nature Communications 2024) showed that intelligent text prediction could cut AAC typing time by up to 60%. We built on both of these ideas with one key difference: a dedicated gaze estimation neural network running entirely on the Snapdragon NPU.

What it does

GazeBoard turns a Samsung Galaxy S25 Ultra into a free, portable communication device. The app has two modes:

Quick Phrases: Four large quadrants display common phrases (Yes, No, Help, More). The user looks at one for a second and the phone speaks it aloud. This covers the majority of daily communication.

Spell Mode: Four quadrants display letter groups (A-G, H-M, N-S, T-Z). The user looks at the quadrant containing their letter. After 2-4 gestures, the app predicts what word they're spelling from a 5,000-word dictionary and displays the candidates. The user confirms with a blink and the phone speaks the word. It works like T9 predictive text, but controlled entirely by eye movement.

Everything runs on-device. No internet. No cloud. No account. Just a phone, a stand, and your eyes.

How we built it

The inference pipeline runs two stages. First, the front camera captures frames at 15fps via CameraX. We detect the face and extract eye crops using Android's built-in FaceDetector. The cropped eye images are resized, converted to the model's expected input format, and fed into a gaze estimation neural network loaded through Google LiteRT's CompiledModel API with Qualcomm NPU acceleration. The model outputs gaze coordinates which we smooth with an exponential moving average and map to one of four screen quadrants after a short calibration step.

The word prediction engine converts each word in a 5,000-word dictionary into a "gesture code" based on which letter group each letter falls in. When the user selects groups, we filter the dictionary in real-time and surface the top candidates when the list narrows to three or fewer matches. The entire lookup runs in microseconds with zero external dependencies.

The UI is built in Jetpack Compose with a glassmorphic design language. Frosted glass panels, subtle depth through translucency, and a floating center island that shows the current prediction and remaining word count. Every interaction is gaze-dwell activated since the user cannot touch the screen.

Challenges we ran into

Gaze accuracy was our biggest challenge. Early models required exaggerated eye movements to register a direction change, making the app feel forced and fatiguing. We experimented with multiple gaze estimation models and crop preprocessing strategies before finding the right combination of model, eye crop sizing, and smoothing parameters that made natural eye movements sufficient for reliable quadrant selection.

Calibration was harder than expected. Everyone's eye movement range is different, and the mapping from gaze coordinates to screen position drifts when the user shifts their head even slightly. We had to build a calibration system that adapts to each user's personal range of motion and remains robust to small head movements.

Making every interaction work without touch required rethinking standard Android UI patterns. Buttons, toggles, and navigation all had to be redesigned around gaze dwell and blink detection.

Accomplishments that we're proud of

The app works. A person can sit in front of the phone, calibrate in under 15 seconds, and start communicating. Quick phrases are spoken in about a second. Spelling a common word takes 3-4 eye gestures and under 10 seconds. The gaze model runs at sub-millisecond inference on the NPU.

We replaced a $15,000 dedicated device with a free app on a consumer phone. The entire system runs offline, which means it works in a car, outdoors, in a hospital room, anywhere. No WiFi required.

The glassmorphic UI feels like a real product, not a hackathon prototype. It's calm, legible from arm's length, and designed for people who will look at this screen all day.

What we learned

On-device ML is ready for real accessibility applications. The Snapdragon NPU combined with LiteRT's CompiledModel API made it possible to run continuous neural inference at frame rate without draining the battery or heating up the phone.

We also learned that the gap between "model works in a notebook" and "model works in a live demo" is enormous. Preprocessing, smoothing, calibration, and edge case handling took more time than the model integration itself.

Most importantly, we learned how much the AAC community needs affordable, portable solutions. The technology exists. The hardware is in people's pockets. The software just needed to be written.

What's next for GazeBoard

Better models: Fine-tuning the gaze estimation model on real ALS patient data would dramatically improve accuracy for the target user population.

LLM-powered prediction: Replacing the dictionary trie with an on-device Gemma model via LiteRT-LM would enable context-aware word prediction, similar to Google's SpeakFaster research. Instead of just matching gesture codes, the LLM could understand conversational context and predict entire phrases.

Phrase customization: Letting caregivers configure the Quick Phrases for each patient's specific needs (medications, family names, daily routines).

Multi-language support: The architecture is language-agnostic. Swapping the dictionary and TTS language would enable GazeBoard to work in any language.

Clinical validation: Partnering with ALS clinics to test GazeBoard with real patients and iterate on the UX based on their feedback.

Open source release: Publishing GazeBoard as an open-source project so anyone can install it, improve it, and adapt it for their needs.