Predicting Likelyhood of parcel delay

graph1
graph2

Inspiration

Although not having too much knowledge about data science, we knew enough to decide what we wanted to do. This was a classification problem that needed return values between 0 and 1 (percent). Starting from that statement, we played around with the given data and tryied to see some correlation.

What it does

Our code takes all the data and reads it, then takes some extra features from other datasets such as the weight of the parcel and the distance between the cities and the destination. Then, a regression model takes the likelyhood that the parcel gets delayed based on this data and the existing data on the orders dataset

How we built it

We built our solution using the programming language Python and the standard data science libraries pandas, numpy and sklearn

Challenges we ran into

We joined Accenture's challenge

Accomplishments that we're proud of

We are proud of all what we were able to do, but in special the data cleaning step, whitch has been more difficult than expected. We are also proud of the insights we were able to get from the data, even though the model was underfitted and could not predict propperly our lable variable