Inspiration
We wanted to apply machine learning to gain insights on a large data set. We got inspired by the Zurich challenge to use their data
What it does
Our application can filter and sort out rows based on defined criterias for the values to prepare the data for being analysed.
How we built it
We built our application through the use of Python and relevant data manipulating libraries such as panda, numpy and the math library.
Challenges we ran into
The main challenge was working with the whole dataset. Since we wanted to apply machine learning, we tried to load the whole dataset into Python Pandas from the start, which lead to crashings in our laptops. We decided to work with smaller subsamples, but when we faced the necessity of filtering rows, we couldn't accomplish it because we had to previously load it entirely. After that, we wrote a script which would filter rows and columns the good ol' way, by parsing everything. We reduced the dataset by a factor of 10, but still it wasn't enough to later on try to match it with every accident of the eleven thousand ones we had. Then Zurich people made us realise that it was better to use a database instead of python to store the ds. We tried AWS solutions, which had a very good service to implement so, but we were then faced by the slow connection and 6 GB ds. We couldn't upload the ds to AWS without compressing it, and if we uploaded it compressed, we had to create a Lambda instance which would uncompress it, but we couldn't get it to do it, and by then, Amazon instructors had already left and it was the last night. After that we couldn't figure out any other way to work with the data, and this is were we got stuck until the end.
Accomplishments that we're proud of
Our filter algorithms work properly and fast.
What we learned
We learned how to approach a problem of this magnitude. Also we learned that it is really important to understand the problem if you dont want to run into complications later on.
What's next for zurich-project
Joint different data sets using Amazon Web Services and SQL. Perform data analysis using AI or Machine Learning. Visually evaluate the data with the use of Heatmaps

Log in or sign up for Devpost to join the conversation.