Inspiration

The inspiration for this project came from a real clinical problem. Patients with endometrial cancer in the NSMP (No Specific Molecular Profile) group make up a large proportion of cases, yet their prognosis is often unclear with current risk stratification guidelines. This uncertainty can make treatment decisions more difficult. Our goal was to explore whether data-driven methods could help provide clearer, more personalized risk information in a way that is understandable and useful for clinicians.

What it does

EndoRisk-NSMP is a web-based clinical decision support tool that estimates the risk of recurrence and survival outcomes for patients with NSMP endometrial cancer. By entering basic clinical and pathological variables, clinicians receive:

  • An assignment to a low, intermediate, or high-risk group.
  • Estimated probabilities of disease-free survival (DFS) and overall survival (OS) at 1, 2, and 3 years.
  • Visual explanations showing which variables contribute most to the patient’s risk.
  • Contextual information comparing the patient to the historical cohort used to train the model.

How we built it

We built the project using a hybrid modeling approach that combines unsupervised and supervised learning. First, we applied K-Means clustering to identify natural patient subgroups within the NSMP population and validated these clusters using Kaplan–Meier survival curves and log-rank tests. In parallel, we trained a Cox Proportional Hazards model to compute an individualized risk score based on clinical variables, following the formulation:

The clustering results and Cox risk scores are combined to assign each patient to a clinically interpretable risk group. The entire workflow is wrapped in a Streamlit application that allows real-time interaction and visualization.

Challenges we ran into

One of the main challenges was working with a limited and heterogeneous dataset, which required careful preprocessing, imputation of missing values, and conservative model validation. Another challenge was ensuring interpretability: translating statistical concepts like hazard ratios and risk scores into outputs that clinicians can easily understand without losing methodological accuracy.

Accomplishments that we're proud of

We are particularly proud of building a fully functional, end-to-end prototype that goes from raw clinical data to an interactive, clinician-facing tool. Successfully combining clustering, survival analysis, and explainability in a single, coherent interface was a key achievement. We also managed to keep the model transparent and interpretable, which is essential in a clinical setting.

What we learned

Through this project, we learned that in healthcare applications, usability and interpretability are just as important as predictive performance. We deepened our understanding of survival analysis, censored data, and model validation, and gained valuable experience designing machine learning tools with real clinical constraints in mind.

What's next for Hack the Uterus!: EndoRisk-NSMP

Next steps include validating the model on larger and external cohorts, refining the risk thresholds in collaboration with clinicians, and improving the interface based on clinical feedback. In seen scenarios, the tool could be extended to incorporate additional molecular features and longer-term survival predictions, moving closer to real-world clinical deployment.

Built With

Share this project:

Updates