Inspiration

For our very first experience into the realm of data science, we were given a large dataset accounting for over seven thousand patients and their medical properties.

What it does

The model takes into account a series of variables and calculates the liklihood of death for that specified patient.

How we built it

The large number of factors were split between those of quantative properties and qualitative properties. For the qualitative categories, the importance of each category was idenitified based on their ability to predict the living status of a patient by seeing how each unique response in a category could predict deaths. For the quantitative categories, categories were first sorted by plotting histograms for each category and removing categories with little to no variation, since categories without any variation will not be good factors to predict off of. We ran into many issues with attempting to plot quantitative data, therefore we had to further sift categories by variation and graph shape (approximately normal distributions are the backbone of statistics and analysis and were favored much more over skewed distributions). We sorted through our remaining quantitative categories by comparing each category to its definition in the document of explanations and removed those that logically wouldn't have much of an impact. This left us with 4 of the qualitative and 6 of the quantitative categories to make predictions off of. The qualitative categories are disability, primary, extraprimary, and cancer. The quantitative categories are timeknown, comorbidity, heart, breathing, age, pain

Challenges we ran into

The starter code had many modules that were not yet downloaded into the IDE, and we ran into trouble managing each module.

There were a series of difficulties in creating the visualizations of each graph. In addition, the amount quanitative data made it troublesome to separate all the points into easily sorted axis as was done for the qualitative data. Having such a large portion of data also made it a bit overwhelming in determining which points were most efficient in computing an accurate model.

Accomplishments that we're proud of

Understanding the objective while creating a working model are achievements that we're grateful for for our first datathon. The struggles that we encountered does not undermine our efforts, and each of the many problems that we faced only furthered our drive into completing such a task. We are all extremely proud to have made it to the end of the competition with at least an answer, and we hope that in the future our knowledge will work to make the answers we arrive at have more substance and evidence than we managed to put together this time around.

What we learned

This challenge brought awareness to the endless steps and horizons of knowledge that came into analyzing and computing a model for a dataset. There were many challenges and unfamilar information that we had to discover, yet we were able to ultilize what we had learned in the past and apply our logic into accounting for the data.

What's next for Beginner Challenge

Knowing what we have gained in this experience, we hope that the obstacles that we came across during these hours will no longer be as troublesome in the coming years. Having these memories will siphon off the slope of the learning curve of future opportunites and new challenges.

Built With

Share this project:

Updates