## Inspiration

There's so much data in the world - we want to make some sense of it.

I want to start participating in Kaggle competitions - this was a perfect learning experience :D

## What it does

We used:

to make classification models that predict the range of number of accidents at a given hour, month and amount of traffic.

See images of input data and scores of prediction attached above.

We are predicting AccidentCountRangeOf5. If AccidentCountRangeOf5 is an integer x that means there'll be x, x+1, x+2, x+3 or x+4 accidents for the given hour, month and number of vehicles on road.

Classification models: this is a classification problem because we aren't predicting the number of accidents but rather the range-of-5 that the number of accidents will belong to. For example, if our model predicts 10 that implies that there'll be between 10*5 and (10*5) + 4 accidents ---> #accidents for that given set of inputs will be 50, 51, 52, 53 or 54.

Note for future hackers: We have thoroughly documented our entire data processing process and code in this Jupyter Notebook so that it becomes a bit easier for you to process government data https://github.com/DeeptanshuM/HackHarvard2018/blob/master/DataCleaningandProcessing.ipynb

## How we built it

We used Pandas to clean and process data. We used Microsoft Azure Machine Learning Studio to build ML classification models and determine their accuracy.

## What we learned

• Discovered and became comfortable using Microsoft Azure Machine Learning Studio
• Gained experience cleaning, processing and preparing real-world raw government data for ML

## What's next

• Fix the NaN value for the macro-averaged precision metric :D
• Compare different types of ML models