Genomic Diagnostics

Inspiration

Rare diseases affect over 300 million people worldwide, yet the average patient endures a diagnostic odyssey of 4–6 years before receiving a correct diagnosis. We were inspired by the frustration of clinicians who lack structured tools to systematically narrow down thousands of possible rare conditions from a handful of symptoms. We wanted to build something that could act as a knowledgeable co-pilot not replacing physicians, but giving them a data-driven starting point rooted in established ontologies like Orphanet and HPO.

What it does

RDDA takes a set of HPO (Human Phenotype Ontology) symptom terms as input and returns a ranked list of candidate rare diseases with probability scores, each paired with recommended genetic tests sourced from Orphanet gene association data. Beyond diagnosis, it actively guides the clinical interview by recommending the next most informative symptoms to ask about ranked by Shannon entropy / information gain so that each follow-up question maximally narrows down the differential. The result is a fast, interactive diagnostic assistant accessible entirely through a browser.

How we built it

We trained an SGDClassifier (logistic loss) on a binary feature matrix derived from phenotype_to_genes.txt, encoding ~6,200 HPO symptom flags plus onset and inheritance metadata across ~2,400 ORPHA-coded diseases over 50 training epochs with L2 regularization. The backend is a FastAPI server that loads the serialized model and metadata at startup, builds a disease–symptom matrix for real-time information gain calculation, and exposes REST endpoints for symptom autocomplete and diagnosis. The frontend is a lightweight vanilla HTML/CSS/JS single-page app that communicates with the API keeping the stack simple and dependency-free on the client side.

Challenges we ran into

The biggest challenge was the extreme class imbalance inherent in rare disease data, some conditions have only a handful of associated HPO terms while others have hundreds, making the model prone to ignoring low-prevalence diseases. Mapping heterogeneous Orphanet/HPO data sources into a clean, consistent feature matrix required significant preprocessing to resolve inconsistent ORPHA ID formats and duplicate HPO associations. Implementing real-time information gain ranking efficiently without precomputing every possible symptom subset also required careful matrix design to keep API response times acceptable.

Accomplishments that we're proud of

We're proud that RDDA covers ~2,400 rare diseases and ~6,200 HPO terms a clinically meaningful scope, while keeping the entire inference pipeline fast enough to run interactively in a browser. The next-symptom recommendation feature, powered by entropy-based information gain, is something we feel genuinely adds clinical value beyond a simple classifier. We also built a clean, self-contained full-stack application from raw Orphanet data ingestion all the way to a polished UI entirely within the hackathon timeframe.

What we learned

We learned how powerful standardized biomedical ontologies like HPO and Orphanet are as a foundation for machine learning when used carefully, the quality of the feature encoding matters far more than model complexity for this problem. We also gained a deep appreciation for the real-world diagnostic challenges in rare disease medicine, and how even an imperfect probabilistic model can provide meaningful decision support when paired with transparent uncertainty estimates. Designing the information gain loop taught us how active learning principles can be applied practically in a clinical tool without requiring any online retraining.

What's next for Genomic Diagnostics

We plan to integrate a large language model layer that can accept free-text clinical notes and automatically extract structured HPO terms, removing the current barrier of requiring clinicians to know HPO terminology upfront. On the model side, we want to explore graph neural networks over the HPO ontology graph to better capture hierarchical symptom relationships that a flat binary feature vector cannot represent. Longer term, we aim to validate RDDA against real de-identified patient cohorts and work toward regulatory-compliant deployment as a clinical decision support tool, making the diagnostic odyssey shorter for every rare disease patient.

Built With

Updates

Kyu Bin Choi started this project — Mar 01, 2026 02:24 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.