Project Update: Heart Attack Risk Prediction Using Machine Learning
Week 1: Project Kickoff & Data Exploration
What I accomplished this week:
Joined Discord & Reviewed Guidelines
- Confirmed project requirements
- Understood submission deadlines and judging criteria
Dataset Analysis
- Explored the Heart Attack dataset thoroughly
- Performed initial data profiling using
df.info(),df.describe(), anddf.isnull().sum() - Identified key features: Age, Cholesterol, Blood Pressure, Maximum Heart Rate, etc.
- Verified target variable distribution (0 = No risk, 1 = At risk)
Environment Setup
- Created Google Colab notebook
- Imported essential libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Established reproducible workflow with random seed fixed
Initial Findings
The dataset contains multiple clinical features with no major missing values. Early visualization shows that age and cholesterol levels have strong correlation with heart attack risk.
Next Week's Goals
- Complete Exploratory Data Analysis (EDA) with visualizations
- Build baseline Logistic Regression model
- Start implementing Random Forest classifier
Lessons Learned So Far
Understanding medical data requires domain knowledge. I spent time researching what each clinical feature means to ensure proper interpretation.
Follow along for more updates as I build toward the March 29th deadline!
Hack4Health #Byte2Beat #MachineLearning #HealthcareAI #HeartDiseasePrediction
Week 2 Update: Models Complete!
What I built this week:
Completed Exploratory Data Analysis
- Created correlation heatmaps
- Visualized feature distributions
- Identified key patterns in the data
Implemented Two ML Models
- Logistic Regression (~85% accuracy)
- Random Forest Classifier (~87.5% accuracy)
Evaluation Metrics
- Accuracy, Precision, Recall, F1-score
- ROC Curve and AUC score
- Confusion Matrix analysis
Key Finding
Random Forest outperforms Logistic Regression, especially in recall (0.89) which is crucial for medical applications.
Next: Explainability & Risk Categorization
Week 3 Update: Making Model Interpretable
Added this week:
Feature Importance Analysis
- Identified top predictors: Age, Cholesterol, Max Heart Rate
- Created visualization showing feature contributions
Risk Categorization
- Added Low/Medium/High risk levels
- Based on probability thresholds
- Makes output more practical for healthcare use
Error Analysis
- Examined False Negatives and False Positives
- Focused on minimizing FN (critical in medical diagnosis)
New Visualizations Added
- Feature importance bar plot
- Risk category distribution
- Enhanced confusion matrix
Final Week: Documentation & Submission Prep
Final Week: Ready for Submission!
Completed this week:
Professional Report (PDF)
- 4-page comprehensive documentation
- Includes methodology, results, and insights
- Professional cover page and formatting
GitHub Repository
- Clean, well-structured repo
- README with complete project documentation
- requirements.txt for reproducibility
Final Notebook Polish
- All cells run without errors
- Markdown sections with clear explanations
- All visualizations properly labeled
Final Model Performance
- Random Forest Accuracy: 87.5%
- Recall (Heart Attack Class): 0.89
- ROC-AUC Score: Strong discriminatory power
Project Status: COMPLETE
Ready for March 29th submission deadline!
Thank you to the Hack4Health team and mentors for this amazing opportunity!
Log in or sign up for Devpost to join the conversation.