Malware Detection using Machine Learning

Inspiration

The increasing complexity and volume of cyberattacks, especially malware, has made traditional detection methods less effective. Machine Learning (ML) offers a proactive approach by analyzing vast datasets and detecting patterns that are invisible to manual inspection or conventional tools. Our inspiration comes from the need to create a scalable, intelligent system that evolves with emerging threats.

What it does

Static: Features are extracted from PE file headers (mainly Optional Header), Yara rules and digital signature.

Dynamic: Features are the API calls traced using Cuckoo Sandbox

How we built it

Algorithm used: We compared multiple algorithms using a 10-Fold stratified cross validation process algorithm, we settled on Extreme Gradient Boosting (XGBoost) classification algorithm as it had the highest F1 score

technologies: python, streamlit cuckoo sandboxing

Challenges we ran into

Data Imbalance: Malware samples are much fewer compared to benign ones, which led to imbalanced datasets. False Positives: It was challenging to minimize false positives without compromising on detecting actual threats. Feature Engineering: Extracting relevant features from dynamic and static analysis of files required a deep understanding of malware behavior.

Accomplishments that we're proud of

We successfully implemented a machine learning-based malware detection system that achieves high accuracy. It can detect malware faster than conventional systems and adapt to new types of threats with minimal retraining. Reducing the false positive rate while maintaining detection accuracy was a significant milestone for us.

What we learned

We learned how crucial data quality is for machine learning applications. Additionally, building robust, scalable solutions in cybersecurity requires constant iteration, testing, and adaptation. Collaboration between data science and cybersecurity teams is essential to ensure the relevance of the model.

What's next for Malware Detection using Machine Learning

In the future, we plan to:

Integrate the system with real-time network monitoring tools. Experiment with deep learning algorithms for better accuracy. Enhance our model's ability to detect zero-day attacks by expanding our dataset. Implement the system in cloud environments to handle large-scale data and real-time applications.

Built With

python
streamlit

Updates

Chandra Kant Bauri started this project — Sep 21, 2024 12:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.