Our inspiration came to us as we browsed datasets on the internet while not knowing what to do. Fortunately, we found a rather simple dataset on Kaggle that contains 16 data points on over 4240 subjects in Framingham, Massachusetts. The Framingham dataset aims to support ML algorithms predicting individuals at risk for 10-year coronary heart disease, so that is exactly what we did.

We didn’t really have any reason for doing it other than it was the most realistic option, but for propriety’s sake, this is our grand mission: In the United States, heart disease is the top cause of mortality for men, women, and the majority of racial and ethnic groups. The early prediction of cardiovascular illnesses might help high-risk individuals make choices about lifestyle adjustments and hence prevent problems.

So, after a couple hours of awkwardly throwing together a machine learning algorithm that almost certainly works way worse than the one the curators of the dataset came up with, we have a working program!

Our project is hosted on the very creatively named https://willihaveaheartdisease.pythonanywhere.com/. We collect the user’s information directly from the website, and we import our ML model to give a prediction on whether the user is at risk for 10-year coronary heart disease or not.

Keep in mind that the model boasts an accuracy of ~0.66 and a ROC AUC score of ~0.675, so it is marginally better than randomly guessing. Regardless of the diagnosis, it is probably better to see a doctor if you feel something is wrong. We do not take responsibility if we diagnose the wrong thing.

For more technical information on the model: libraries we commonly draw on are Numpy, Pandas, Seaborn, and Sklearn. Our model is saved using Pickle. The dataset we train our model on is a .csv file that we interpret using Pandas. We clean the data by dropping rows that have invalid entries, and we further clean it by dropping columns with unnecessary (i.e. Education level) and hard-to-determine-at-home (i.e. cholesterol level and glucose level). We then balance the data with weights of 15 for no heart disease and 85 for heart disease. This was a critical design choice: it raised our ROC AUC from 0.5 (randomly guessing) to 0.675, which isn’t great but is better than random.

For more technical information on the website, consult our specialized web developers Jonathan Allen Eubanks and Nisarg Patel because I don’t know anything.

Thank you very much! For potential investors, please join our waitlist by emailing hanmo@gatech.edu. We are a very popular company, the rising star of Atlanta poised to shake the very foundations of this world, so we apologize if we take longer to respond to you.

Built With

Share this project:

Updates