We love challenges and machine learning, and this is a perfect fit.
What it does
Predicts accident severity given accident and vehicle characteristics. We use label and target categorical encoding + gradboosted trees.
How we built it
In the Python language: Pandas for preprocessing and feature extraction, and scikit learn and numpy for the modeling and prediction. Furthermore, Folium for data visualization.
Challenges we ran into
Dealing with categorical variables: The variables for this problem were predominantly categorical. Our approach was to combine the techniques of label and target encoding.
Dealing with a variable amount of cars per accident: How do we represent sets of cars of differing sizes with the same number of variables. Our approach: target encoding + max, mean, min (aggregate functions)
Accomplishments that we're proud of
We've been able to cooperate and work with each other, giving our opinions freely and discussion our plan of action without losing our way. This has allowed us to reach a solution, which given the difficulty of the problem, we consider fairly good.
What we learned
The most difficult part of tackling a machine learning problem is not always the algorithm itself, but rather the selection and extraction of variables and processing phase.
What's next for HackYourCrash Boosting McKinsey Challenge
We are very excited to give our presentation and share and discuss our work with other hackers.