Inspiration

Our project MalEnd was inspired by

  • The need for a robust solution to enhance malware detection accuracy for 0-days.
  • Evolving malware sources and intelligent nation-state threat actors.
  • Extensive scope in the field of AI,ML and DL technologies in malware analysis.

What it does

Our proposed ensemble model,

  • Classifies the malware and the attack pattern with both static and dynamic features relying on network data and OS level data.
  • By combining multiple ML algorithms, the model improves detection rates and reduces false positives.
  • Privacy-preserving method to detect malware.

How we built it

Datasets used for training:

  • MalRec : This dataset consists of malware samples that have been meticulously analyzed, allowing us to extract static and dynamic features critical for classification
  • CTU-13 : This dataset consists of labeled network traffic captures that include various types of malware communications. Development of the model:
  • 1) Extensive feature extraction from each dataset, focusing on both static features (Network properties, byte sequences) and also dynamic features (OS level execution patterns, API calls).
  • 2) We selected Random Forest as one of our primary base learners.
  • 3) To enhance the performance further we used AdaBoost and Gradient Boosting.
  • 4) Combining the predictions from the Random Forest and Boosting algorithms using bagging and stacking.
  • 5) Training and k-fold cross-validation on the ensemble model.
  • 6) Testing the ensemble model on unseen data for simulating real-world scenarios.

Challenges we ran into

  • Imbalanced datasets.
  • Feature extraction was harder and time-consuming.

Accomplishments that we're proud of

  • We achieved a validation accuracy of over 99% on the test dataset.

What we learned

  • Value of best feature extraction and best model selection.
  • Combining different types of models can lead to supreme advantage.

What's next for Ensemble Model based Malware analysis

  • Incorporating more diverse datasets on different perspectives.
  • Incorporating hardware level features and perspectives.
Share this project:

Updates