Inspiration
The inspiration for this project came from the malware threats increasing day by day.
What it does
It processes raw data and selects important features and trains a Random Forest classifier and evaluates its performance in identifying different classes of malware
How we built it
In Python we used several key libraries like- 1 - Pandas and NumPy for data manipulation 2 - Scikit-learn for machine learning algorithms and preprocessing 3 - mbalanced-learn for handling imbalanced datasets 4 - Feature selection techniques to focus on the most relevant attributes
Challenges we ran into
The biggest challenge was how to deal with imbalanced datasets and Handling potentially large datasets efficiently.
Accomplishments that we're proud of
1- we have implemented end-to-end machine learning pipeline 2- We have used advanced techniques like SMOTE for dataset balancing and SelectKBest for feature selection 3- Its a reusable code structure
What we learned
1-Advanced data preprocessing techniques 2-Handling imbalanced datasets in classification problems 3-Evaluating model performance using various metrics
What's next for Achievers
We will be experimenting with other machine learning algorithms and developing a user interface for easier interaction with the model and at last we will expand the pipeline to handle real-time malware detection
Built With
- csv
- numpy
- pandas
- python
- smote
- visual-studio
Log in or sign up for Devpost to join the conversation.