Contents of the dataset were very interesting and we decided to look closer at what it entails
What it does
We created a report available on the GitHub page containing the most important information of the attacks. It contains the location of tracked attacks accompanied by the visualisation on the world map. Information is divided by category of attacks starting from the least significant/ dangerous such as scanning and ending at the most dangerous such as sending malware and attempts at exploiting systems.
How we built it
As a team, the report was created in LaTeX. Data were prepared and analysed using Python and its libraries such as json, pandas, numpy and others. Visualisations were created in matplotlib with the help of photoshop for fine-tuning.
Challenges we ran into
The sheer size of dataset made it difficult to operate on all of the data at the same time since it wouldn't fit in our computer's ram memory. Dataset had to be split into smaller parts and analysed in batches. Splitting dataset also made it harder to calculate statistics and visualisations since we had to invent ways to aggregate the results.
Accomplishments that I'm proud of
We were able to combine information from most of the datasets provided despite its size.
What we learned
How to manipulate huge datasets and alter common functions such that they work in batches.
What's next for Data visualisation and insight into UPJS assets challenge
If we had more time we would start to build more sophisticated models of the underlying data using most information possible. Due to time constraints, we decided to focus on data preparation and visualisation as they are the first steps of data analysis.