py-LMDC (Linux Malware Detection & Classification) System

Theme - Open Innovation

Introduction

Malware is intrusive software designed to damage and destroys computer systems. The common types of malware include computer viruses, computer worms, Ransomware & Keyloggers. This malicious software may destroy crucial data or remove our access from it. Anti-malware is a computer program used to prevent, detect, and remove malware. This anti-malware software help in the detection and thereby prevention of attacks on systems. This project aims to provide an ML-based approach to increase the security of a system against such attacks by detecting the malicious software before any damage.

Workflow - Our Approach

We have used supervised learning techniques to tackle this problem.
Data Preparation
- We uncompressed all the files which were provided to us after renaming them by their class-name.
- Now we cleaned the data by cleaner.py .
- Now the features are extracted from the files and saved in a CSV file. We added a last column in this CSV named type which contains the class-name of the file. This became our target variable.
  - Then we cleaned the dataset by filling in missing values and other things.
Model Training
- We have used Random Forest Classifier to train our model.
- We generated training and test data using train_test_split function.
  - After the model is trained, we saved the model as finalized_model.sav.
- Model Testing
  - We also tested the model on the test data.
  - Accuracy and F1 score of the model is calculated and can be viewed by un-commenting the training function.
  - The accuracy and F1 score are printed in the console.
- Generating the result
  - The trained model is loaded, and we use the data from perfect.csv to predict the class-name of the files.
  - File names with their respective predicted class-name is saved in result.csv.

TECH STACK - *PYTHON*

Pandas
Numpy
Scikit-learn
csv
Pickle
Matplotlib
Pyelftools
Missingno

Dataset generated for Training & Testing Model

Result Sample

Structure of Elf File

Result

Our Model successfully processes the malware given as a dataset and, we can classify different types of malware and take further steps to prevent them.