Inspiration

Genetic testing is supposed to give answers — but for millions of people, the result is a variant of uncertain significance (VUS): a DNA change that might matter, might not, and leaves patients and clinicians stuck between “we found something” and “we don’t know what to do.”

That gap is especially painful for patients heading into CRISPR or gene therapy, where the whole plan depends on knowing whether a specific change is actually the right target. Today, resolving a VUS means a genetic counselor or researcher manually pulling from ClinVar, gnomAD, mouse databases, literature, and ACMG criteria — slow, opaque, and inaccessible to the person who needs the answer most.

We built Surface to turn that opaque research process into something a patient can actually follow: upload your sequencing file, watch a live agent investigate your variant across real databases and species, and walk away with a plain-language explanation plus a printable brief for your doctor.


What it does

Surface is a live VUS investigation platform. A user uploads a VCF sequencing file; the app parses it in the browser (nothing stored), annotates every variant via live Ensembl VEP, and lets the user pick one to investigate along with their clinical context (e.g. cancer predisposition, long-QT syndrome, hypercholesterolemia).

From there, a durable evidence pipeline runs in the background and streams results in real time:

  1. DNA decode animation — a cinematic sequence scan that resolves onto the actual substitution from the user’s file
  2. The change we found — a visual of the real DNA letter change
  3. What the agent is thinking — a live thought trail as Grok narrates each step (“I’m reading mouse disease research now…”), with an optional Grok voice that speaks those thoughts aloud
  4. What this means — a plain-language summary written by Grok, grounded in the run’s real evidence and published literature, including a CRISPR/gene-therapy one-liner
  5. Doctor Brief — a printable, ACMG-framed clinical document
  6. Watch — automatic re-checks so if ClinVar or other evidence changes, the user is notified

The core innovation is the Mechanism-Compatibility Gate: a 0–1 valve that decides whether animal-model evidence actually applies to the human disease mechanism. A dramatic mouse knockout can be correctly suppressed when the disease is gain-of-function (e.g. Timothy syndrome in CACNA1C) — preventing the classic trap of over-calling pathogenicity from irrelevant cross-species data.

Nothing is faked. Every API call is live. Empty results render honestly as “not found,” never swapped for demo data.


How we built it

Frontend: Next.js 16 (App Router), React 19, TypeScript, Tailwind v4, shadcn/ui. The intake page parses VCF files entirely in the browser; only parsed coordinates are sent for gene lookup. The session page subscribes to a per-run Inngest Realtime channel and renders fragments, pipeline updates, and Grok narration as they arrive.

Backend pipeline: A single orchestrator (run-evidence-pipeline.ts) runs as a durable Inngest function with retried steps. It fans out across seven public genomics APIs — Ensembl VEP, gnomAD constraint, MyVariant (ClinVar + predictors), Ensembl conservation, DIOPT orthologs, IMPC mouse phenotypes, Monarch Phenodigm, and Europe PMC literature — then layers four Grok reasoning calls on top:

  • Predictor leadership (AlphaMissense, REVEL, CADD disagreement)
  • Mechanism gate (with xAI Live Search for genes outside our curated mechanism table)
  • Cross-species sanity check + relevance scoring
  • Synthesis (plain-language summary + ACMG rows)

Confidence scoring is deterministic (a layered model: gene prior → variant effect → mechanism gate × cross-species). Grok writes the prose but never overrides the computed label.

AI: One xAI client (grok-4.3 via the OpenAI-compatible SDK). Reasoning mode uses the Responses API; Live Search handles out-of-table genes. Agent voice uses xAI Text-to-Speech (grok-voice-think-fast-1.1) proxied server-side so the API key never reaches the browser.

Infrastructure: Inngest v3 with Realtime middleware for live streaming, Vercel for deployment, optional Upstash KV for durable run storage across serverless instances.

Two teammates built in parallel — one owning the research engine and pipeline, one owning the UI and voice — merged via a shared contract in lib/types.ts with zero drift between frontend and backend event shapes.


Challenges we ran into

Making cross-species evidence honest, not misleading. Mouse knockout data is seductive — embryonic lethality looks dramatic. But if the human disease is gain-of-function, that signal is actively misleading. Designing the Mechanism Gate as a multiplier (not another confidence bar) and validating it on CACNA1C (where the gate correctly closes and suppresses a real, strong IMPC signal) was one of the hardest design problems.

Live-only, no demo reliability. Hackathon demos tempt you toward fixtures and fallbacks. We held the line: every user upload triggers a real pipeline against real APIs. When Grok or a connector fails, the UI says so honestly rather than substituting canned text. That made debugging harder but made the product trustworthy.

Streaming a multi-minute pipeline to the browser. The Inngest Realtime middleware on v3 (not v4 — v4 broke the middleware path we depend on) publishes fragment, narration, pipeline_update, and complete events to a per-run channel. The frontend reducer upserts fragments by ID (IMPC relevance gets updated mid-run) and replaces cumulative pipeline state — all while a DNA animation, voice queue, and patient summary race to stay in sync.

Two parallel codebases, one contract. Frontend and backend were built by different people with different assumptions. Merging required a frozen shared contract (lib/types.ts), killing duplicate Grok clients/models, and verifying that fixture replay and live subscription produce identical UI shapes.

Patient-facing language from clinical evidence. Grok must explain ACMG criteria, mouse phenotypes, and predictor disagreement in language a patient preparing for gene therapy can actually use — without inventing findings. Narrow, schema-validated prompts with the deterministic confidence label as a fixed input kept synthesis grounded.


Accomplishments that we're proud of

  • A full live VUS pipeline from VCF upload → real multi-API evidence gathering → mechanism-gated confidence → Grok-written patient summary and Doctor Brief — no mocks in the user path
  • The Mechanism-Compatibility Gate — a novel layer that prevents the #1 cross-species pitfall in variant interpretation, with a real demo case (CACNA1C) where it correctly suppresses dramatic mouse evidence
  • Transparent agent trace with voice — patients see and hear what the agent is doing step by step, in plain English, powered by live Grok narration + xAI TTS
  • Honest uncertainty — KCNQ1 demonstrates the tool saying “we still don’t know” when predictors disagree and mouse data is absent, instead of forcing a confident call
  • Verified example VCFs — four single-variant sample files (LDLR, CACNA1C, KCNQ1, ATM) plus a 12-variant panel, each mapped to a real clinical context and validated live against ClinVar and Ensembl
  • Clean architecture — one orchestrator, one Grok client, one model, one Realtime contract; frontend and backend merged without forking types
  • Production-deployed on Vercel with Inngest Cloud, end-to-end verified on the live production URL

What we learned

  • Deterministic math + LLM prose is the right split. Let code own the confidence label; let Grok own the explanation. When Grok could override the score, it would; when it can’t, the product stays auditable.
  • Mechanism matters more than magnitude. A p=1e-63 mouse phenotype is worthless if the disease mechanism doesn’t match. Building that as a first-class pipeline layer (not a post-hoc disclaimer) changed how every run is scored.
  • Real-time streaming transforms trust. Showing fragments arrive one by one — with the agent narrating each step — makes a 2–3 minute pipeline feel like an investigation, not a loading spinner.
  • “Found: false” is a feature. Rendering empty API results honestly builds more credibility than filling gaps with synthetic data ever could.
  • Parallel development needs a frozen contract early. lib/types.ts as the single integration seam saved us from merge hell between the research engine and the UI.
  • Voice is additive, not essential. The Grok TTS layer enhances accessibility but fails silent — the on-screen thought trail carries the full experience if audio isn’t available.

What's next for Surface

  • Clinician workflow integration — FHIR export, EHR-friendly brief formats, and counselor review/override before results reach the patient
  • Multi-variant triage — rank all VUS in an uploaded panel by investigability and clinical context fit, not just one-at-a-time selection
  • Watcher alerts — email or push notifications when ClinVar reclassifies a watched variant, with an automatic re-run diff
  • Expanded mechanism table + Live Search — broader gene coverage with cited literature research for out-of-table genes, validated against ClinGen gene-disease validity
  • Somatic and CNV support — extend beyond germline SNV/indels to structural variants and tumor panels
  • Regulatory path exploration — position as clinical decision support (not diagnostic), with audit trails, versioned evidence snapshots, and HIPAA-eligible infrastructure (xAI Voice is already HIPAA-eligible)
  • Longitudinal tracking — let patients re-upload updated sequencing over time and see how their VUS landscape evolves

Built With

Share this project:

Updates