Predicts accident severity given accident and vehicle characteristics. We use label and target categorical encoding + gradboosted trees.

In the Python language: Pandas for preprocessing and feature extraction, and scikit learn and numpy for the modeling and prediction. Furthermore, Folium for data visualization.

  1. Dealing with categorical variables: The variables for this problem were predominantly categorical. Our approach was to combine the techniques of label and target encoding.

  2. Dealing with a variable amount of cars per accident: How do we represent sets of cars of differing sizes with the same number of variables. Our approach: target encoding + max, mean, min (aggregate functions)

We've been able to cooperate and work with each other, giving our opinions freely and discussion our plan of action without losing our way. This has allowed us to reach a solution, which given the difficulty of the problem, we consider fairly good.

The most difficult part of tackling a machine learning problem is not always the algorithm itself, but rather the selection and extraction of variables and processing phase.

