py-LMDC (Linux Malware Detection & Classification) System
Theme - Open Innovation
Introduction
Malware is intrusive software designed to damage and destroys computer systems. The common types of malware include computer viruses, computer worms, Ransomware & Keyloggers. This malicious software may destroy crucial data or remove our access from it. Anti-malware is a computer program used to prevent, detect, and remove malware. This anti-malware software help in the detection and thereby prevention of attacks on systems. This project aims to provide an ML-based approach to increase the security of a system against such attacks by detecting the malicious software before any damage.
Workflow - Our Approach
- We have used supervised learning techniques to tackle this problem.
- Data Preparation
- We uncompressed all the files which were provided to us after renaming them by their class-name.
- Now we cleaned the data by
cleaner.py. - Now the features are extracted from the files and saved in a CSV file. We added a last column in this CSV named type which contains the class-name of the file. This became our target variable.
- Then we cleaned the dataset by filling in missing values and other things.
Model Training
- We have used Random Forest Classifier to train our model.
- We generated training and test data using
train_test_splitfunction.- After the model is trained, we saved the model as
finalized_model.sav.
- After the model is trained, we saved the model as
- Model Testing
- We also tested the model on the test data.
- Accuracy and F1 score of the model is calculated and can be viewed by un-commenting the training function.
- The accuracy and F1 score are printed in the console.
- Generating the result
- The trained model is loaded, and we use the data from perfect.csv to predict the class-name of the files.
- File names with their respective predicted class-name is saved in
result.csv.
TECH STACK - *PYTHON*
PandasNumpyScikit-learncsvPickleMatplotlibPyelftoolsMissingno
Result
Our Model successfully processes the malware given as a dataset and, we can classify different types of malware and take further steps to prevent them.
TEAM DEBUGGERS
Built With
- csv
- matplotlib
- missingno
- numpy
- pandas
- pickle
- pyelftools
- python
- scikit-learn

Log in or sign up for Devpost to join the conversation.