Data visualisation and insight into UPJS assets challenge

Inspiration

Contents of the dataset were very interesting and we decided to look closer at what it entails

What it does

We created a report available on the GitHub page containing the most important information of the attacks. It contains the location of tracked attacks accompanied by the visualisation on the world map. Information is divided by category of attacks starting from the least significant/ dangerous such as scanning and ending at the most dangerous such as sending malware and attempts at exploiting systems.

How we built it

As a team, the report was created in LaTeX. Data were prepared and analysed using Python and its libraries such as json, pandas, numpy and others. Visualisations were created in matplotlib with the help of photoshop for fine-tuning.

Challenges we ran into

The sheer size of dataset made it difficult to operate on all of the data at the same time since it wouldn't fit in our computer's ram memory. Dataset had to be split into smaller parts and analysed in batches. Splitting dataset also made it harder to calculate statistics and visualisations since we had to invent ways to aggregate the results.

Accomplishments that I'm proud of

We were able to combine information from most of the datasets provided despite its size.

What we learned

How to manipulate huge datasets and alter common functions such that they work in batches.

What's next for Data visualisation and insight into UPJS assets challenge

If we had more time we would start to build more sophisticated models of the underlying data using most information possible. Due to time constraints, we decided to focus on data preparation and visualisation as they are the first steps of data analysis.

Built With

latex
numpy
pandas
photoshop
powerbi
python
sklearn-api
statsmodels-api

Submitted to

Hack Kosice 2020
- Winner Assets Challenge

Created by

I worked with Python pandas and numpy libraries to prepare dataset for analysis. Also created basic graphs in matplotlib and models with sklearn.

Kamil Iwanowski
I worked with PowerBI, Excel, Photoshop and LaTeX to create the report. At the beginning, I also did some data engineering in Python.

Mateusz Mazurkiewicz
Rocket science student

Updates

Kamil Iwanowski started this project — Sep 06, 2020 04:17 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.