Inspiration
Our project MalEnd was inspired by
- The need for a robust solution to enhance malware detection accuracy for 0-days.
- Evolving malware sources and intelligent nation-state threat actors.
- Extensive scope in the field of AI,ML and DL technologies in malware analysis.
What it does
Our proposed ensemble model,
- Classifies the malware and the attack pattern with both static and dynamic features relying on network data and OS level data.
- By combining multiple ML algorithms, the model improves detection rates and reduces false positives.
- Privacy-preserving method to detect malware.
How we built it
Datasets used for training:
- MalRec : This dataset consists of malware samples that have been meticulously analyzed, allowing us to extract static and dynamic features critical for classification
- CTU-13 : This dataset consists of labeled network traffic captures that include various types of malware communications. Development of the model:
- 1) Extensive feature extraction from each dataset, focusing on both static features (Network properties, byte sequences) and also dynamic features (OS level execution patterns, API calls).
- 2) We selected Random Forest as one of our primary base learners.
- 3) To enhance the performance further we used AdaBoost and Gradient Boosting.
- 4) Combining the predictions from the Random Forest and Boosting algorithms using bagging and stacking.
- 5) Training and k-fold cross-validation on the ensemble model.
- 6) Testing the ensemble model on unseen data for simulating real-world scenarios.
Challenges we ran into
- Imbalanced datasets.
- Feature extraction was harder and time-consuming.
Accomplishments that we're proud of
- We achieved a validation accuracy of over 99% on the test dataset.
What we learned
- Value of best feature extraction and best model selection.
- Combining different types of models can lead to supreme advantage.
What's next for Ensemble Model based Malware analysis
- Incorporating more diverse datasets on different perspectives.
- Incorporating hardware level features and perspectives.
Built With
- git
- jupyter
- keras
- numpy
- pandas
- python
- pytorch
- r
- scikit-learn
- sqlite
- tensorflow
- virustotal
Log in or sign up for Devpost to join the conversation.