Thought Process
- We adjusted given code in order to train an ai model to predict the likelihood of a patient surviving given certain conditions.
- We wanted to make the data as reliable as possible by "filtering out" unnecessary data.
- Our plan was to take the top "influential" data columns and use those to train the Keras Sequential Model to make educated predictions based off a provided patient's details.
- To do this, we wanted to use PCA to return a number value of its importance.
Challenges
- Although we wanted to use PCA, we ran into uses with, null values (which we did handle using pandas), string data, and negative values; the PCA code did not accept data values less than 0 which we did not understand.
- Moving on from that, we decided to try and manually find the most important columns of the data.
- We did this by calculating the median value of a column where the patient died and comparing it to the median value of that same column using the rows where the patient lived. If the median was similar, we decided it was not that important in the outcome of the patients survivability.
- We used the median because, when there are many values, it's likely that outlier values (such as abnormally high numbers or, more likely, 0 values because null values were replaced with 0 as well) effected the overall average. However, in hind sight, because there were so many data values it might have been better to simply take the mean instead.
- Once we found what we believed to be the most important parts of the dataset, we tried to filter out "bad data" which we deemed was fine to remove; however, it ended up causing issues when trying to submit as we apparently were dropping the given patient's test data.
Overall
- It was interesting working with something that we never before had done.
- Researching unknown topics and resolving issues was a headache, however, it came together in a way that we are content with.
- Data Science is an interesting topic with many real-world uses, as we worked on, and becoming more fluent in what we work with and how it works is very important so we are glad that this exposed us, although it was quite the challenge for a first time datathon.
Additional Info
There is a Zip File including the Github Repository in the Additional Info section and has much more media that we used to complete the project. The presentation we have is in the docs folder in the repository.
Built With
- data-science
- google-collaborate
- keras
- machine-learning
- python

Log in or sign up for Devpost to join the conversation.