py-LMDC (Linux Malware Detection & Classification) System

Theme - Open Innovation

Introduction

Malware is intrusive software designed to damage and destroys computer systems. The common types of malware include computer viruses, computer worms, Ransomware & Keyloggers. This malicious software may destroy crucial data or remove our access from it. Anti-malware is a computer program used to prevent, detect, and remove malware. This anti-malware software help in the detection and thereby prevention of attacks on systems. This project aims to provide an ML-based approach to increase the security of a system against such attacks by detecting the malicious software before any damage.

Workflow - Our Approach

  • We have used supervised learning techniques to tackle this problem.
  • Data Preparation
    • We uncompressed all the files which were provided to us after renaming them by their class-name.
    • Now we cleaned the data by cleaner.py .
    • Now the features are extracted from the files and saved in a CSV file. We added a last column in this CSV named type which contains the class-name of the file. This became our target variable.
      • Then we cleaned the dataset by filling in missing values and other things.
  • Model Training

    • We have used Random Forest Classifier to train our model.
    • We generated training and test data using train_test_split function.
      • After the model is trained, we saved the model as finalized_model.sav.
    • Model Testing
      • We also tested the model on the test data.
      • Accuracy and F1 score of the model is calculated and can be viewed by un-commenting the training function.
      • The accuracy and F1 score are printed in the console.
    • Generating the result
      • The trained model is loaded, and we use the data from perfect.csv to predict the class-name of the files.
      • File names with their respective predicted class-name is saved in result.csv.

TECH STACK - *PYTHON*

  • Pandas
  • Numpy
  • Scikit-learn
  • csv
  • Pickle
  • Matplotlib
  • Pyelftools
  • Missingno

Result

Our Model successfully processes the malware given as a dataset and, we can classify different types of malware and take further steps to prevent them.

Presentation Link

Video Demo Link

TEAM DEBUGGERS

Built With

Share this project:

Updates