Inspiration

In all modern organizations there is an inevitable dependency on some form of digital infrastructure. Such infrastructure is typically used to host mission critical services and sensitive information relating to the the organization in question. Services generate an abundance of data in the shape of log files that can tell a story of the activities of your company; "Susan logged in at 10 am", "Karen sent Susan a picture of a cat at 10.30 am" and in the worst case scenario: "Sneaky Steve opened a back door to your payroll service and gave himself a raise". Such verbosity is obviously a pipedream, as systems engineers we typically encounter log files that are 10,000+ lines long which can make them difficult to analyze.

So - can we build something that makes such logs more useful? Can we get an overview of what is going on in our systems? How frequently do certain events occur and are there any outliers? Surely, with some for of machine learning, this could be achieved?

What it does

Our project strives to provide an automated solution to collect log files from designated nodes in an arbitrary network. The collected log files are parsed by a clustering-algorithm that returns N amount of clusters that are then plotted on a dashboard to visually display the size of each cluster (e.g how many log messages have been bundled into that cluster) and how the clusters (log messages) are related.

This can be set up in a matter of minutes on a single machine and can identify suspicious activity in any type of scenario, be it build logs or TCP communication.

How we built it

The backbone of our project is a flask application hosted on a docker container with a number of methods that interface between the host system, the clustering engine, and a Neo4j database.

Challenges we ran into

Since the clustering engine that we chose for this project does not have native support for Neo4j, we had to be creative to come up with a solution for the interface between the clustering engine and our database. Fitting the pieces of the puzzle together also required some mental gymnastics.

Accomplishments that we're proud of

Finding the right use of each component and inventing a rudimentary interface between them.

What's next for DeepSec

Expanding our demonstration sequence to deploy to a network of hosts and trying new AI models to categorize and cluster the input data. We would also like to demonstrate how malicious activity such as data leaks or intrusion could be caught in real time.

Built With

Share this project:

Updates