Inspiration
The increasing complexity and volume of cyberattacks, especially malware, has made traditional detection methods less effective. Machine Learning (ML) offers a proactive approach by analyzing vast datasets and detecting patterns that are invisible to manual inspection or conventional tools. Our inspiration comes from the need to create a scalable, intelligent system that evolves with emerging threats.
What it does
Static: Features are extracted from PE file headers (mainly Optional Header), Yara rules and digital signature.
Dynamic: Features are the API calls traced using Cuckoo Sandbox
How we built it
Algorithm used: We compared multiple algorithms using a 10-Fold stratified cross validation process algorithm, we settled on Extreme Gradient Boosting (XGBoost) classification algorithm as it had the highest F1 score
technologies: python, streamlit cuckoo sandboxing
Challenges we ran into
Data Imbalance: Malware samples are much fewer compared to benign ones, which led to imbalanced datasets. False Positives: It was challenging to minimize false positives without compromising on detecting actual threats. Feature Engineering: Extracting relevant features from dynamic and static analysis of files required a deep understanding of malware behavior.
Accomplishments that we're proud of
We successfully implemented a machine learning-based malware detection system that achieves high accuracy. It can detect malware faster than conventional systems and adapt to new types of threats with minimal retraining. Reducing the false positive rate while maintaining detection accuracy was a significant milestone for us.
What we learned
We learned how crucial data quality is for machine learning applications. Additionally, building robust, scalable solutions in cybersecurity requires constant iteration, testing, and adaptation. Collaboration between data science and cybersecurity teams is essential to ensure the relevance of the model.
What's next for Malware Detection using Machine Learning
In the future, we plan to:
Integrate the system with real-time network monitoring tools. Experiment with deep learning algorithms for better accuracy. Enhance our model's ability to detect zero-day attacks by expanding our dataset. Implement the system in cloud environments to handle large-scale data and real-time applications.
Built With
- python
- streamlit
Log in or sign up for Devpost to join the conversation.