GeneGuard

Inspiration

Families can download raw DNA files from 23andMe/Ancestry—but then what? Most dashboards either overwhelm people with jargon or stop at vague trait lists. We wanted to turn overwhelming scientific and medical data into clear, actionable insights a family can actually use together.

The technical spark came from ADAGIO, a disease–gene prioritization approach based on network biology. In many diseases, genes work in pathways; ADAGIO runs a random-walk propagation on a protein–protein network starting from known disease genes to produce a relevance score for every gene. We asked: what if we precompute those scores for common diseases and make them usable in a friendly web app?

What it does

GeneGuard lets a user upload their existing raw genomics file (23andMe/Ancestry TXT or VCF) and then:

Choose a disease (e.g., Alzheimer’s, T2D) or auto-rank all diseases by aggregate risk.
We map the file’s rsIDs → genes, intersect with our precomputed ADAGIO tables, and return a ranked list of relevant genes with High/Medium/Low levels (by rank bands).
For each hit, we generate five concise, evidence-anchored lifestyle suggestions (WHO/NIH/CDC style wording) to turn insights into action.
Export results as CSV and (optionally) share with family to compare overlaps.

Note: Research-grade, not diagnostic; we display a clear disclaimer encouraging follow-up with a genetic counselor.

How we built it

Frontend: React.js with a clean upload flow, “How It Works” page, and a results table (expand/collapse per gene for tips), plus CSV export.
Backend: FastAPI (Python), deployed on Render.
- Endpoints:
- GET /diseases — available diseases
- POST /upload-genome — parse TXT/VCF, map rsIDs → genes via MyVariant.info, intersect with ADAGIO tables, return ranked genes + tips
- POST /auto-rank — score all diseases and return top-N by aggregate ADAGIO score
- GET /results/{id}/csv — CSV export of a prior result
- Parsing & annotation: TXT (rsIDs) and VCF (streamed via cyvcf2); both resolve to gene symbols.
- ADAGIO integration: precomputed JSON per disease (adagio_{disease}.json) with {gene: {risk, rank}}.
- Risk levels: rank bands (Top 100 = High, 101–300 = Medium, 301–500 = Low).
- Performance: in-memory ADAGIO cache, parallel disease scoring, tip generation only for displayed results, and graceful API timeouts.

Challenges we ran into

Bridging biology and CS: Translating network propagation + variant annotation into outputs a non-expert can trust and understand—without overpromising clinical meaning.
Data plumbing in real time: Robustly mapping rsIDs to genes from mixed TXT/VCF formats and handling API rate limits.
Latency vs. depth: Balancing precomputation (ADAGIO) with just-in-time annotation; we added caching and parallelism to keep the UI snappy.
First hackathon for half of us: New tools, new workflows, shipping under a tight clock while aligning design, backend, and demo.

Accomplishments that we’re proud of

Shipped end-to-end: polished React UI + robust FastAPI backend, fully deployed and demo-ready.
Under-the-hood quality: clean API boundaries, streaming VCF parsing, cached annotations, and parallel top-k scoring for responsiveness.
Great UX polish: intuitive upload flow, clear risk levels, expandable tips, and easy CSV export.
Scientific framing: we made network biology approachable—precomputing ADAGIO scores so families get insights in seconds.

What we learned

How to operationalize a research method (ADAGIO) into an API with real latency constraints.
Practical genomics I/O: handling different file formats, edge cases, and mapping pipelines.
The impact of precomputation + caching and where parallelism pays off most for perceived speed.
Clear disclaimers and tone matter: users want actionable guidance, not anxiety or false certainty.
Team skills: faster scoping, defining interfaces early, and iterating frontend ↔ backend in lockstep.

What’s next for GeneGuard

More diseases and phenotype panels; expand and version ADAGIO tables.
Scalability: move from in-memory store to a lightweight DB, background jobs for heavy annotations, smarter caching.
Richer insights: pathway-level explanations, polygenic-style aggregation, cohort benchmarks (e.g., “siblings share X high-priority genes”).
Clinician-friendly export: one-page summary PDF with clear caveats and references.
Privacy & sharing: invite links with granular scopes; optional encryption at rest.
Accessibility: broader file support, glossary of terms, and more educational “What this means” content.