Federated-Malware-detection

Inspiration

The "Federated Malware Detection System" was inspired by the increasing sophistication of malware threats and the limitations of traditional centralized detection systems. Privacy concerns, the need for scalability, and the adaptability required to combat emerging threats motivated us to explore federated learning as a decentralized approach to build a robust malware detection system.

What it does

The system classifies Portable Executable (PE) files as either malware or legitimate software using a federated learning approach. By allowing multiple clients to train a shared global model on their local data, the project ensures privacy while collaboratively improving the system's malware detection accuracy.

How we built it

We developed an end-to-end federated learning pipeline comprising:

Client Nodes: Each client trains a local model on its dataset without sharing sensitive data.
Server Node: The server aggregates the local models to create a global model for malware detection.
Dockerized Environment: The system is containerized using Docker, enabling easy deployment and scalability. We utilized Python and frameworks like PyTorch to implement the federated learning architecture.

Challenges we ran into

Data Distribution: Ensuring an even and realistic distribution of PE files across client nodes.
Model Aggregation: Implementing effective techniques for combining local models into a reliable global model.
Resource Management: Handling the computational and storage constraints during training and deployment.

Accomplishments that we're proud of

Successfully deploying a fully functional federated learning-based malware detection system.
Preserving data privacy while achieving competitive accuracy in malware classification.
Creating a scalable and modular architecture using Docker for real-world applicability.

What we learned

The importance of federated learning in preserving data privacy and improving collaboration across distributed systems.
Advanced techniques in model aggregation and optimization for decentralized training.
The complexities of working with cybersecurity datasets and the unique challenges of malware classification.

What's next for Federated-Malware-detection

Enhanced Model Performance: Improving the accuracy and adaptability of the system to handle zero-day malware.
Real-world Deployment: Collaborating with cybersecurity firms to deploy the system in production environments.
Expanded Applications: Extending the federated learning framework to other domains, such as ransomware detection and intrusion prevention systems.

Built With

docker
flower
pefile
python
pytorch

Updates

Karthik D started this project — Jan 03, 2025 01:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.