ReadingMind

A Chrome extension that feels you struggling — and responds in the moment.


Inspiration

We kept noticing the same thing in ourselves and in friends with dyslexia or ADHD: you read a paragraph, reach the end, and realize you absorbed nothing. You re-read it, maybe twice more, and still bounce off. The page itself gave you no acknowledgment that this was happening — no help, no marker, nothing.

Meanwhile, your eyes had been screaming the whole time. Long fixations, backward regressions, scattered saccades — the literature on reading-stress eye-tracking has known these signals for decades. They just lived in $400,000 lab setups.

We wanted to ask: what if the page could feel you struggling, and respond? Not after the fact, not in a dashboard — in the moment, on the paragraph you're on, in a way a student on a Chromebook could actually use.


What we learned

  • Webcam gaze tracking is genuinely usable in 2026. MediaPipe Face Mesh + iris landmarks + a head-pose baseline gets you to ~25–60 px median error in a browser. That's good enough for paragraph-level intent — which turns out to be the granularity that matters for reading interventions.
  • The hard problem isn't detection, it's gating. An intervention that fires on false positives is worse than no intervention — it trains the user to distrust the system. We spent as much time on readingScore, degradationScore, and hysteresis as on the stress signal itself.
  • Longitudinal context changes the voice of an LLM coach. Passing the last 5 sessions into a tool-calling agent turned generic feedback ("you seemed stressed") into personal observation ("third time this week dense technical text hit you hardest — mornings have been going smoother").
  • Manifest V3's offscreen-document API is the hidden unlock. Without it, there's no way to keep a camera open from a Chrome extension at all. Content scripts can't hold media; service workers can't touch DOM. The offscreen bridge is what made this possible.
  • A 10-second dwell is an underrated signal. Simpler than any ML classifier, and it produced the single feature testers wanted to keep most — long-dwell auto-summarize.

How we built it

Architecture: Chrome Extension (Manifest V3) + local Node companion server.

An offscreen document holds getUserMedia and runs MediaPipe Face Mesh at ~30 fps. A custom iris-tracker.js + head-pose-layer.js pair refines the eye-socket geometry into a stable gaze point, which is streamed to the active tab's content script.

content.js maps that gaze onto real <p> elements via a DOM-level DwellGrid, and extracts three reading-stress signals — s_fix, s_sac, s_reg ∈ [0, 1] — for fixation duration, saccade disorganization, and backward-regression frequency over a rolling window.

Those collapse into one stress score and a three-tier bucket:

s = 0.45 · s_fix  +  0.35 · s_sac  +  0.20 · s_reg

tier(s):
  CALM      →  s < 0.30
  ELEVATED  →  0.30 ≤ s < 0.60
  OVERLOAD  →  s ≥ 0.60

applyInterventionStable applies hysteresis and a gating predicate — interventions only fire when the reading score clears a threshold and the tracker isn't degraded:

fire = (s_read ≥ 0.42) ∧ (¬degraded) ∧ cooldownExpired

On each tier, different interventions engage: per-paragraph dynamic typography, screen-border halo, sentence highlight, dimming surround, and — on sustained OVERLOAD — an AI paragraph rewrite via Dedalus (Claude Haiku 4.5).

A 10-second long-dwell detector fires a bullet summary tooltip if gaze rests on the same paragraph past the threshold. At session end, the whole telemetry blob — stress timeline, tier transitions, paragraphs read, reading pace — is POSTed to a local spectrum-server (Express + spectrum-ts), which runs a tool-calling coach agent with 6 tools and delivers the result as an iMessage via AppleScript.

Stack: Chrome MV3, MediaPipe Face Mesh, TFJS BlazeFace/FaceMesh, WebGazer (fallback), Dedalus Labs (LLM gateway → Claude Haiku 4.5), Photon Spectrum (iMessage), Express 4, Node, Cursor.


Challenges we ran into

False triggers in bad lighting. Our first demo had the UI lighting up every time someone blinked into a backlit window. Fix: a degradationScore that penalizes low measurement ratio and high jitter, and a hard gate that refuses to fire interventions until tracker quality recovers.

MV3's three-separate-debuggers problem. Content script, popup, and service worker each have their own DevTools window, and none of them share logs. We lost real hours tracking a bug that turned out to be a silent message-port error in the service worker.

Calibration drift. A user who calibrated leaning forward and then sat back would see gaze estimates drift ~80 px high. We added a versioned head-pose baseline (iris_pose_v1) separate from the 9-point calibration so pose can be re-baselined without redoing the whole thing.

Tuning the OVERLOAD classifier. Our first "hard paragraph" classifier required 3+ OVERLOAD samples on a paragraph to label it hard — and real users mostly hit ELEVATED, never OVERLOAD. We rebuilt the labeler to treat ELEVATED as half-weight hard samples:

w_hard = n_OVERLOAD + 0.5 · n_ELEVATED
label  = HARD  ⟺  w_hard ≥ 2.0

Then we removed the classifier entirely in favor of the simpler 10-second dwell rule, because it performed better in user testing with a fraction of the code.

Not double-flashing feedback. The screen-border halo and the AI summary card fighting for attention felt panicky. We made the halo self-dismiss the instant an AI intervention takes over — one feedback channel at a time.

iMessage in 2026. Figuring out the least-friction delivery path (AppleScript → imessage-kit → Photon Cloud) was a rabbit hole. AppleScript won because it just works on any Mac with Messages signed in — no Full Disk Access, no allowlist, no cloud.

The file that ate our sprint. content.js grew to ~160 KB / 3600+ lines. At that size, every feature touches five other features. We survived by leaning hard on Cursor to do surgical multi-file edits and by keeping all mutable state in a single LearningState object.

Built With

Share this project:

Updates