Inspiration

Post-operative delirium affects 50% of elderly surgical patients and is one of the most under-detected, under-treated complications in modern medicine. It doubles hospital stays, accelerates long-term cognitive decline, and increases 30-day mortality — yet in most hospitals, screening consists of a nurse asking a patient to count backwards. There are no objective measurements, no longitudinal tracking, and no early warning system before symptoms become obvious.

The clinical research on this problem is extensive and largely ignored at the bedside. Studies published in journals like Frontiers in Aging Neuroscience and PLOS ONE have demonstrated that saccadic eye movements are statistically significant biomarkers of neurological risk, with AUCs of 0.76–0.78 for predicting post-operative delirium in elderly arthroplasty patients. Prosaccade latency in healthy adults averages 180–230ms; patients who develop delirium show latencies exceeding 250ms in the hours before clinical onset. Smooth pursuit gain in healthy adults sits at 0.82 ± 0.07; it drops measurably with cerebrovascular compromise. Fixation stability, pupil variability, and saccade accuracy each carry independent predictive weight.

The research existed. The bedside tools to act on it didn't, until we realized that every smartphone already has a camera capable of tracking eye movements at 30Hz.

AURA was built to close that gap: turn peer-reviewed ocular biomarker research into a 3-minute, camera-based screening tool that any patient can use in a recovery room, at home, or in a clinic — no specialist, no wearable, no referral required.


What It Does

AURA is a longitudinal cognitive screening platform that tracks six clinically validated eye-movement biomarkers across multiple sessions, comparing each post-operative result against the patient's own pre-operative baseline.

The eye assessment has three phases:

  1. Fixation phase — the patient holds their gaze on a stationary dot. AURA measures fixation stability $$($1 - \min(\sigma_{gaze} \cdot k, 1)$$, where $$\sigma_{gaze}$$ is the standard deviation of iris position in normalized frame coordinates) and baseline pupil variability (normalized iris-width variance across frames).

  2. Saccade phase — the dot jumps between two positions. AURA measures:

    • Saccadic peak velocity — the maximum iris displacement speed over a 33ms window
    • Prosaccade latency — time from stimulus onset to first detectable eye movement, using a position-shift method against a pre-jump iris baseline
    • Saccade accuracy — whether the eye lands on the correct side of the target
  3. Smooth pursuit phase — the dot oscillates sinusoidally across the screen. AURA measures smooth pursuit gain:

$$gain = \frac{v_{gaze}}{v_{target}}$$

sampled at 30Hz using MediaPipe Face Mesh iris landmarks in real time.

All six biomarker readings are sent to a FastAPI backend where Google Gemini 2.5 Flash analyzes the longitudinal trend against the patient's stored pre-operative baseline. Gemini is grounded exclusively in committed peer-reviewed research — it cannot reason beyond what the papers explicitly support. The output is a risk stratification (low / moderate / high), a confidence score $$\in [0, 1]$$, and a per-metric breakdown with evidence sentences traceable to specific papers.


How We Built It

Frontend — React + TypeScript with Vite, styled with Tailwind CSS and shadcn/ui. The entire eye-tracking engine runs in-browser using MediaPipe Face Mesh via the @mediapipe/tasks-vision WASM runtime — no server round-trip during the exam. Iris landmarks 468 and 473 give sub-pixel gaze position as normalized frame coordinates at 30Hz. Voice guidance is powered by the ElevenLabs TTS API. Authentication supports both email/password and Solana wallet sign-in via @solana/wallet-adapter-react.

Backend — Python FastAPI with Motor (async MongoDB driver) connected to MongoDB Atlas. Three primary services:

  • auth_service.py — bcrypt-hashed registration and login, auto-creating a linked patient document on signup
  • mongodb_atlas_service.py — patient and session CRUD, atomic baseline locking, longitudinal trend builder, and per-metric delta computation for all six biomarkers
  • gemini_service.py — loads peer-reviewed research from backend/research/, constructs a system prompt that grounds all analysis strictly in that corpus, calls Gemini 2.5 Flash with structured JSON output, and validates/repairs the response through a retry loop with schema enforcement

AI Layer — Gemini is configured with temperature=0.1 and response_mime_type="application/json". A strict system prompt prohibits reasoning beyond the provided research documents. Every explanation entry must follow the format metric:baseline:latest:evidence and every medical claim must be traceable to a specific paper. The output schema is validated field-by-field, with automatic repair on the final retry before falling back to an inconclusive result.

Research corpus — We sourced and committed peer-reviewed papers on saccadic biomarkers for post-operative delirium (Kang et al.; Al-Hindawi et al.), smooth pursuit gain in cerebrovascular risk, pupillary response changes in delirium, and longitudinal eye movement analysis for neurological deterioration. Gemini can also auto-fetch and cache new research via Google Search grounding when enabled.


Challenges We Ran Into

Prosaccade latency at 30Hz. Clinical systems measure latency at 500–1000Hz. At 30Hz, a velocity threshold approach fires on normal slow drift, producing latencies that paradoxically shorten when the subject moves more slowly — the exact opposite of the correct clinical direction. We solved this with a position-shift method: capture the average iris X position from the four frames immediately before the dot jumps (the pre-jump baseline xˉpre\bar{x}_{pre} xˉpre​), then detect the first frame where displacement exceeds a threshold of 0.008 normalized frame units (approximately 2–3 pixels, above the noise floor) after an 80ms physiological gate (true saccades never begin before ~80ms post-stimulus):

latency = t_detect - t_jump, where |x_iris - x_pre_avg| > 0.008 and (t_detect - t_jump) >= 80 ms

Metric calibration without ground truth. With no reference eye tracker to validate against, all six metrics had to respond in the correct clinical direction when we deliberately performed fast-vs-slow versions of each test pattern. We iterated on scale factors, normalization approaches, and detection thresholds across dozens of test sessions until all six moved in the expected direction simultaneously.

Grounding Gemini in evidence without hallucination. LLMs will confidently generate plausible-sounding clinical thresholds that don't exist in any paper. We addressed this with an explicit prohibition against pretrained clinical knowledge, a requirement that every explanation sentence be traceable to a specific document, and an automatic fallback to inconclusive when the research corpus doesn't contain sufficient evidence. The retry loop re-injects validation errors directly into the follow-up prompt.

Atomic baseline locking. MongoDB's update_one with a {"baseline": None} query guard ensures the baseline is written atomically on the first session and can never be overwritten by a concurrent session completing at the same moment.


Accomplishments That We're Proud Of

  • A fully in-browser, real-time, 30Hz eye-tracking pipeline with zero server dependency during the exam — standard webcam only
  • A Gemini integration that genuinely refuses to output a risk assessment if the research corpus doesn't support one
  • Six clinically meaningful metrics that all respond in the correct direction, with prosaccade latency correctly measured using a position-shift method adapted for 30Hz consumer hardware
  • A complete longitudinal data model: permanent pre-op baseline, per-session deltas, sliding time-series window, trajectory classification, and a build_longitudinal_summary function that packages everything into a single Gemini prompt
  • A full auth system with MongoDB-backed accounts, bcrypt passwords, and Solana wallet sign-in — wired end-to-end from registration through session storage

What We Learned

The gap between "a metric exists in research literature" and "a metric works correctly on a consumer webcam at 30Hz" is enormous. Smooth pursuit gain required understanding that $gain = v_{eye} / v_{target}$ must be computed in normalized frame coordinates with careful handling of pursuit phase boundaries. Fixation stability required a non-linear scale factor to produce meaningful $[0, 1]$ values from raw gaze standard deviation. Prosaccade latency required understanding both the physiological constraints and the noise characteristics of MediaPipe iris tracking at rest.

We also learned that constraining an LLM to be less confident is genuinely harder than making it more helpful. The schema validation, retry loop, injected error feedback, and fallback to inconclusive were as much engineering work as the analysis pipeline itself.


What's Next for AURA

  • Validated clinical thresholds — partner with a hospital or research group to collect labeled data (confirmed delirium cases, pre/post-op cohorts) and calibrate the six metrics against known outcomes
  • Antisaccade task — add a deliberate inhibition task where the patient looks away from the appearing dot; antisaccade error rate is among the strongest single predictors of delirium in the literature
  • Physician portal — the backend already supports physician accounts and patient assignment; build the longitudinal trend dashboard for clinical review
  • Solana audit trail — hash every session result to the Solana devnet as an immutable, tamper-proof audit record for clinical accountability
  • FDA De Novo pathway — AURA is positioned as a risk-stratification support tool, not a diagnostic device; the conservative Gemini grounding and evidence-only reasoning are intentional regulatory design choices

Built With

Share this project:

Updates