Mining Malware

Inspiration

As a team of young and beginning computer science and cybersecurity professionals who are very passionate about cybersecurity, we are always trying to stay updated on the latest cyber vulnerabilities and attacks. With several new attacks identified every day, it is hard to know which ones we are truly vulnerable to and may affect us. Our inspiration for this hack was to create a program that allows users ranging from people who know nothing, to the experts in cyber to be able to see what the top vulnerabilities are without having to search themselves.

What it does

We have created a web app that allows users immediate access to the most critical cyber vulnerability and attack articles. Mining Malware determines the risk level of cyber threats reported in articles from multiple verified cyber reporting web services and showcases the highest risk threat articles. Mining Malware allows for a streamlined approach for users to skip the process of reading through hundreds of reports on low-risk malware themselves.

How we built it

Front-end: We used CSS and HTML to create and design our webpage for an optimal user interface sporting Technica colors.

Back-end: Built-in python using the newspaper library. We scan, download, and parse using natural language processing (NLP) to analyze the risk level of cyber threats written about in each article. An articles' risk level is determined by analyzing the attack vector (network, adjacent, local, and physical), the attack complexity (high or low), and redemption level (undefined, workaround, temporary, and physical). Each article is assigned a numeric value, and articles with overall high vulnerability scores are considered high-risk, and articles with a low vulnerability scores are considered to be a low-risk. We also added a tagging interface so the user could see how the program determines the risk level of an article based on keywords.

Challenges we ran into

Front-end: With only the basics of HTML as experience, we had to learn on the clock how to code advanced CSS animations and designs within an HTML file. Then we had to successfully connect the back-end code with the front-end code without prior experience, which was a fantastic learning opportunity.

Back-end: Our main problem was determining what libraries and tools to use. Although no one on the team had ever used NLP before, we knew that it would pair nicely with our idea. We went down several different paths and implementations until we finally determined what library, language, and type of application would best represent our vision. Working through these problems took a lot of trial and error, and support from every teammate.

Accomplishments that we're proud of

Not only are we proud of ourselves for successfully designing and implementing a program that will help people of all skill levels understand and analyze cybersecurity vulnerabilities and exposures, but we are also proud for challenging ourselves with concepts, libraries, tools, and languages that we had no previous knowledge of. We are thrilled that our hard work and dedication during this hackathon allowed us to create an application that will add security to the lives of billions.

What we learned

We all challenged ourselves by purposefully choosing a route that we knew we didn't have any prior experience doing. We each chose a part that included coding using languages, libraries, NLP, hosting services, and HTML/CSS web-dev that we didn't know beforehand. By having a background in other languages, we had a fundamental understanding that allowed us to learn Python, Django, web-dev, NLP in the time given. We also learned about different types of attacks, and what makes some attacks more vulnerable than others. This gave us a better understanding of network protocols, malware analysis, and operating systems vulnerabilities.

What's next for Mining Malware

We would like to implement machine learning to teach our program keywords, so we won’t have to tell our program what to search for manually. With more time to dive deeper into deep learning (pun 100% intended), we would like to restructure our project to use a fully recurrent neural network for reinforcement learning. This way, the network will learn and process data in the present and the past to combine and determine how to categorize new threat articles best. YAY AI!!

We would also like to analyze the threat level on a complete scale by including topics such as scope, vulnerable component, and impact component. Additionally making each category more comprehensive. For example, as opposed to just searching for keywords such as "network" to define the attack vector as a network, we would perform a more extensive analysis by teaching our program to ask itself questions such as: Does the attacker exploit the vulnerable component via the network stack? Yes: Can the vulnerability be exploited from across a routed network? a: Network - Vulnerability is exploited across the internet, or absent more information, assume worst case.
b: Adjacent - Vulnerability is exploitable across a limited physical or logical network distance No: Does the attacker require physical access to the target? a: Local - Attack is committed through a local application vulnerability, or the attacker can log in locally. b: Physical - Attacker requires physical access to the vulnerable component.

We would also like our program to be able to analyze these articles in multiple different languages so that we can affect a higher number of people in a more significant way all across the world. The more people who are aware of these threats, the people who can spread awareness and encourage others on how to protect themselves from being victims to these exploits.

Built With

Submitted to

Technica 2017
- Winner Best Cybersecurity Hack | Mantech
- Winner Amazon Web Services - Best Use of AWS

Created by

back-end (python, newspaper module, NLP)
front-end (Django, html, css)

Marina Moskowitz
I worked on the front-end and created the animations and designs to make the information look presentable. Like stated, I had basic experience prior to the hackathon so I learned pretty quickly on the fly.

Chanise Taylor
Kelly Ervin
Computer Science student at Virginia Tech

Updates

Marina Moskowitz started this project — Nov 05, 2017 07:42 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.