Inspiration
We were inspired by StatQuest's PCA video as it seemed like it was a good way of simplifying a multidimensional data set.
What it does
It uses PCA to simplify the multidimensional data into more manageable data which we can then perform linear regression on. This linear regression is what will be used to predict the chance of death based on the values.
How we built it
- We changed the data to make it more manageable by removing bad data or replacing empty spots with the mean of the column data
- We standardize the data so that we can do PCA on it
- We use PCA to simplify that data is relatively similar and replace all the similar data into a singular column of data
- We add the PCA values onto our dataset and remove the data that we did PCA on
- We perform linear regression on the simplified dataset
- We trained a model (tried to)
- Model should be able to predict based on x values (maybe) ## Challenges we ran into
Accomplishments that we're proud of
we have a simplified data table! the model is runnable!
What we learned
We learned how to deal with bad data PCA is a way that data scientists simplify data based on the relationships between data We learned how to create and train a model
What's next for Hackians!
This will not be out last Datathon! See you next year on the podium! LOL. :3
Log in or sign up for Devpost to join the conversation.