What inspired it: Staying locked in is not an easy task. One glance at a second monitor or a phone, and you’re “locked out" for the next 10 minutes. Most tools only block sites or log time, as they don’t reflect where attention really is. GooseFocus was inspired by the idea that your eyes are a honest signal of focus, and that a small, immediate cue (in our case, a loud goose) could pull you back before a short drift turns into a long one.

What we learned: Browser-based CV is viable for prototypes: running face/eye models locally can support real-time-ish feedback, but frame rate, lighting, and camera quality matter a lot. “Focus” is a design problem, not only an ML problem: you need clear states (on-task, notes, mild distraction, away), sensible thresholds, and calibration so the same model behaves for different people and setups. Product UX around the camera is sensitive: explaining on-device processing, offering calibration, and keeping controls obvious builds trust. Audio in the browser has rules: autoplay and user gestures mean honks need to be wired to interaction and state carefully. Monorepos pay off when scoring logic, types, and API boundaries are shared between the web app and other packages.

How we built it: The project is organized as a workspace with a Next.js front end, shared types and domain logic, and a CV/scoring pipeline that turns camera frames into a focus score and status each tick. A Pomodoro-style session ties “work” phases to when alerts are allowed. Calibration captures a quick baseline so “looking at notes” vs “looking away” isn’t one-size-fits-all. Analytics summarize sessions (timelines, distributions, streaks) from stored session stats. Voice personalities swap different honk clips and playback settings for the same underlying alert flow. Styling uses CSS variables / Tailwind so light and dark themes stay readable end-to-end.

Challenges we faced: Stable gaze and head cues across devices, angles, and lighting, without a lab setup. Tuning thresholds so the app isn’t noisy (constant honks) or useless (never fires); mild vs distracted tiers need different timing and escalation. Distinguishing “good” looking away (e.g. note-taking mode) from distraction, using both CV and simple signals like keyboard/mouse activity where helpful. Performance: balancing model cost, frame rate, and UI smoothness on ordinary laptops. Polish: accessibility of charts and text in light and dark themes, and keeping the experience fun (goose branding, audio) without undermining the seriousness of privacy and control.

Built With

Share this project:

Updates