Inspiration
What inspired this project was a malware infection that disrupted our work flow. Several months ago, while doing an important task , my system showed signs of being infected by a malicious software such as a significant decrease in performance and frequent crashes. This made me consult with my friends, now teammates for this hackathon for a solution. Unfortunately, we were forced to fully reset the system in order to reverse all the damages that was caused. This experience first hand made us realize how damaging such software is and motivated us to create something that could efficiently detect it.
What it does
Our project is titled "ML-driven Malware Classification System" or an ML-MCS. What it does is that it analyzes the behavior of a file and classifies as one of the six: Ransomware , Spyware , Adware , Worm , Trojan or Benign as these are the most common. The analysis is done by analyzing the attributes of the file such as CPU usage, memory usage, and network activity. The system is designed to quickly provide real-time malware detection with accuracy by using ML models.
How we built it
The system is built with Python as the main programming language. Popular python library 'pandas' was used for manipulating data. Machine Learning models were built and trained using open-source python libraries 'scikit-learn' and 'xgboost'. Front-end part was developed through the combination of HTML/CSS and JavaScript and reinforced by Flask. This allowed us to create a simple web interface that could be used by users. RandomForest , GradientBoosting , and XGBoost models was trained on a custom malware dataset. We also aimed for improving the prediction accuracy by combining the results from multiple models.
Challenges we ran into
Saying that the project was built easily would be a lie. We are novice programmers and hence encountered many errors in the beginning as we lacked expertise and knowledge. One of the most challenging and time-consuming things to do was to refine the datasets and remove useless and incomplete parts. We also faced issues in fine-tuning the models to improve their performance.
Accomplishments that we're proud of
After spending much time and effort in this project, we are proud of its ability of processing data and providing a classification through a user friendly interface. Also we are proud of the challenges we overcame and experience obtained.
What we learned
We learnt a lot about malwares and machine learning itself in general. This is the biggest reward that we obtained by taking part in this hackathon. We also learnt the difficulties of creating and training different ML models and complexities of malware detection.
What's next for ML-driven Malware Classification System
We plan on expanding the system to support a wide variety of file types. We also hope to improve the scalability of the system allowing it to process even larger datasets and help in solving real-world problems.
Log in or sign up for Devpost to join the conversation.