Inspiration

We wanted to apply machine learning to gain insights on a large data set. We got inspired by the Zurich challenge to use their data

What it does

Our application can filter and sort out rows based on defined criterias for the values to prepare the data for being analysed.

How we built it

We built our application through the use of Python and relevant data manipulating libraries such as panda, numpy and the math library.

Challenges we ran into

The main challenge was working with the whole dataset. Since we wanted to apply machine learning, we tried to load the whole dataset into Python Pandas from the start, which lead to crashings in our laptops. We decided to work with smaller subsamples, but when we faced the necessity of filtering rows, we couldn't accomplish it because we had to previously load it entirely. After that, we wrote a script which would filter rows and columns the good ol' way, by parsing everything. We reduced the dataset by a factor of 10, but still it wasn't enough to later on try to match it with every accident of the eleven thousand ones we had. Then Zurich people made us realise that it was better to use a database instead of python to store the ds. We tried AWS solutions, which had a very good service to implement so, but we were then faced by the slow connection and 6 GB ds. We couldn't upload the ds to AWS without compressing it, and if we uploaded it compressed, we had to create a Lambda instance which would uncompress it, but we couldn't get it to do it, and by then, Amazon instructors had already left and it was the last night. After that we couldn't figure out any other way to work with the data, and this is were we got stuck until the end.

Accomplishments that we're proud of

Our filter algorithms work properly and fast.

What we learned

We learned how to approach a problem of this magnitude. Also we learned that it is really important to understand the problem if you dont want to run into complications later on.

What's next for zurich-project

Joint different data sets using Amazon Web Services and SQL. Perform data analysis using AI or Machine Learning. Visually evaluate the data with the use of Heatmaps

Share this project:

Updates