AI Threat Intelligence Detection Model

Inspiration In today's interconnected world, cyber threats are becoming increasingly complex and frequent. Manual monitoring and rule-based systems are no longer sufficient to detect modern attacks. This inspired us to develop an AI-powered model that can intelligently classify network traffic as either benign or malicious using machine learning techniques. Our goal was to contribute towards building safer networks and improving real-time intrusion detection capabilities.

What We Learned Data Preprocessing: We learned the importance of handling missing values, irrelevant columns, and scaling features to improve model performance.

Model Training: We understood how Random Forest classifiers work for multi-class classification problems and how hyperparameters impact model accuracy.

Exploratory Data Analysis (EDA): Visualizing network data helped us discover underlying patterns, imbalances, and anomalies in the dataset.

Model Evaluation: We explored various evaluation metrics such as accuracy, precision, recall, and F1-score to assess the performance of the classifier.

Model Serialization: We learned to save trained models and scalers using joblib for easy reuse in production or deployment scenarios.

How We Built It Data Collection: Used the publicly available IDS Intrusion CSV Dataset from Kaggle (02-14-2018.csv).

Data Cleaning: Removed unnecessary columns, handled missing data, and applied label encoding.

Exploratory Data Analysis (EDA): Utilized seaborn and matplotlib to plot distributions and correlations to understand traffic behavior.

Model Training: Implemented a Random Forest Classifier using scikit-learn to detect benign vs. attack traffic.

Model Evaluation: Measured performance using various classification metrics.

Model Saving: Saved the trained model (ai_threat_intelligence_model.pkl) and the scaler (scaler.pkl) for future inference.

Challenges We Faced Data Imbalance: The dataset had an unequal distribution of benign and attack records, which impacted model learning.

Feature Selection: Deciding which features to retain was challenging as some were irrelevant or noisy.

Computational Resources: Training the model on a large dataset required optimization to avoid excessive memory and processing time.

Understanding Network Features: Gaining knowledge about network traffic features such as Flow Duration, Tot Fwd Pkts, etc., was necessary to interpret the dataset correctly.

Built With

joblib
matplotlib
numpy
pandas
python
scikit-learn
seaborn

Updates

D Sarika Dinesh started this project — Jun 26, 2025 08:16 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.