-
-
Feature Importance Analysis: Identifying Age, Weight, and Systolic Blood Pressure as the leading risk factors in our predictive model.
-
"Model Performance Metrics: Evaluating accuracy (71.60%), precision, and recall to ensure reliable cardiovascular risk assessment."
-
"Data Cleaning & Outlier Detection: Visualizing physiological anomalies in blood pressure data to ensure high-quality training inputs."
-
"Biomedical Dataset Snapshot: Overview of the 70,000 anonymized patient records used for training and testing the AI model."
Inspiration
Cardiovascular diseases are a global challenge, often diagnosed too late for effective intervention. My inspiration was to see if Machine Learning could identify subtle patterns in routine health data—like age, weight, and blood pressure—to provide an early warning system. I wanted to create a tool that could potentially save lives by predicting risks before they become critical.
What it does
The project is an AI-powered diagnostic assistant. It analyzes clinical and lifestyle data from 70,000 anonymized patient records. By processing these inputs, the model predicts the likelihood of cardiovascular disease with an accuracy of 71.60%.
How we built it
I used Python and Google Colab for the entire development process.
- Exploratory Data Analysis (EDA): I identified and removed physiological outliers (e.g., impossible blood pressure readings) to ensure data integrity.
- Modeling: I implemented a Random Forest Classifier, an ensemble learning method known for its robustness and ability to handle non-linear biological data.
Challenges we ran into
The biggest challenge was "noisy" data. Some patient records contained erroneous blood pressure values (like 16,000 mmHg). Cleaning this data without losing valuable information required careful statistical filtering.
Accomplishments that we're proud of
I am proud of achieving a solid 71.60% accuracy on a large-scale dataset. More importantly, I successfully implemented "Explainable AI" by creating visualizations that show exactly why the model flags a certain risk.
Key Risk Drivers Identified:
- Age: The most significant predictor.
- Weight & Height: Crucial physiological indicators.
- Systolic Blood Pressure (ap_hi): Leading clinical marker.
What we learned
I learned the critical importance of data preprocessing in medical AI. I also deepened my understanding of how ensemble models like Random Forest can be used to solve complex, real-world health problems.
What's next for AI-Driven Early Cardiovascular Risk Detection
The next step is to integrate real-time data from wearable devices to move from static predictions to continuous health monitoring. I also aim to test the model on more diverse global datasets.
Built With
- clob
- matplotlib
- pandas
- python
- scikit-learn
- seaborn
Log in or sign up for Devpost to join the conversation.