Inspiration
In modern healthcare, time is the most critical variable. Traditional cardiovascular risk assessments often rely on subjective patient surveys or time-consuming laboratory tests that can delay life-saving triage. We were inspired to bridge this "semantic gap" by identifying a set of seven objective, verifiable data points that can be gathered promptly to provide immediate clinical triage.
What it does
Our project provides a high-sensitivity rapid screening framework. By inputting seven "Rapid Risk Indicators"—age, sex, BMI, smoking status, alcohol consumption, stroke history, and healthcare access—the model generates a risk profile. Unlike standard models that aim for raw accuracy, our tool is optimized for clinical triage, ensuring that we catch as many at-risk individuals as possible (73% sensitivity/recall) before they leave the clinical setting.
How we built it
We utilized the CDC's Behavioral Risk Factor Surveillance System (BRFSS) dataset to train a logistic regression model using a strict 60/20/20 train-validate-test split.
- Class Weighting: We implemented balanced class weights to address the 10% heart disease prevalence, specifically optimizing for recall to minimize missed diagnoses.
- Statistical Rigor: Every feature in our subset was verified for statistical significance (p < 0.001) using maximum likelihood estimation.
- Validation: We moved beyond standard metrics by analyzing calibration curves to ensure the model remains reliable for triage ranking rather than just absolute diagnosis.
Challenges we ran into
The primary challenge was the inherent "noise" in survey data. Many variables were subjective "soft" indicators (e.g., self-reported general health). We made the decision to strip the model down to only objective indicators, ensuring the data could be verified in under a minute to maintain clinical reliability and speed.
Accomplishments that we're proud of
- 73% Recall Rate: On completely unseen test data, our model successfully flagged nearly 3 out of every 4 cardiovascular events using only 7 non-invasive questions.
- Ironclad Significance: Achieving a p-value of 0.000 across all indicators, proving the "Rapid Risk Indicator" theory is mathematically sound.
- High NPV: A 96% Negative Predictive Value, meaning clinicians can have 96% confidence that a patient is healthy when the model classifies them as low-risk.
What we learned
We discovered that a prior stroke history is a powerful "vascular byte," increasing the odds of heart disease by over 1.35x. We also learned that while model calibration is a challenge in imbalanced sets, it provides a more honest look at "real-world" risk than standard accuracy scores ever could.
What's next for Rapid Screening for Cardiovascular Risk
We plan to expand this into a mobile-first "Triage Calculator" for rural healthcare workers. Additionally, we aim to explore non-linear models to see if capturing complex interactions between age and BMI can push our sensitivity past the current threshold.
Built With
- jupyter
- matplotlib
- numpy
- pandas
- python
- scikit-learn
- seaborn
- statsmodels
Log in or sign up for Devpost to join the conversation.