We started with a cerebral strokes data set, inspired by how one of our members' (Keshav) grandmother suffered greatly as a result of one a few years ago and passed recently. She struggled a lot to perform a lot of basic functions as a result of the whole left side of her body being paralyzed. We found a data set of about 40000 patients but too few actually ended up having strokes, which led to our neural network predicting no for everyone (although this was technically very high accuracy, it was just because of the skewed proportion of diagnoses). We tried cutting the data a few times to make the proportion of positive diagnoses bigger, but that didn't work so we ended up switching gears around to a heart disease dataset.
What it does
The neural network that we built uses the Keras framework to take in a dataset of about 300 patients (with about a 50/50 proportion of positive/negative diagnoses) and, after training and validation using this dataset (which is achieved by splitting the data and running it through a 3-layer binary classification neural network), basically outputs a number (1 or 0) indicating whether the patient has heart disease or not.
How we built it
We used what we learned in our Machine Learning course at AET this year in order to use the Keras framework and build a 3-layer logistic regression neural network (16 node input layer, 16 node hidden layer, and 1 node output layer for binary classification) with 135 epochs and a batch size of 128 for the training portion of the code (basically just determines length/reps of training using the split dataset). These values were attained through trial-and-error over the course of about 2 hours, in order to try and maximize the accuracy of the model.
Challenges we ran into
Obviously the biggest challenge that we ran into was in the initial cerebral strokes dataset. We didn't really know how to make a neural network all the way from scratch (we only used shell codes in our machine learning course this year) so we had to learn as we went. We couldn't really submit what we had since our dataset proportions were so messed up that no matter how much we cut the data or modified the network structure, we couldn't get an accuracy above about 70%. This was really tough but after a few hours of messing around with the code we couldn't get it to work out so we used what we learned in order to switch tracks and find a new dataset to work on. The heart disease dataset wasn't too difficult since we just had to take what we learned from the first try and change some variables/imports around in order to adapt to the new dataset.
Accomplishments that we're proud of
We are really proud of ourselves not only because we were able to switch tracks/datasets even after hours of hard work for the cerebral strokes dataset but also because we didn't know how to build neural networks from scratch, so we had to learn on the run using online tutorials etc.
What we learned
We learned how to make neural networks from scratch and that we should analyze our datasets more carefully in the future to make sure that the proportions of the data/diagnoses are manageable/usable and not too small or big.
What's next for Heart Disease Detection
We don't really know what we would do with more time but I think the next step would just be to work on trying to improve the accuracy and the model structure in order to maximize the efficiency of the network.