Dashboard
Risk Leaderboard
Per User risk
Recent Alerts
All users

AI-Powered Insider Threat Detection System

Inspiration

Modern organizations face a growing class of cybersecurity risks that traditional perimeter-based security cannot detect — insider threats.

These threats arise from:

Compromised credentials
Malicious employees
Privilege misuse
Abnormal data exfiltration

Unlike external attacks, insider threats operate within trusted boundaries. They are statistically subtle rather than signature-based.

This project builds a behavior-driven anomaly detection system that models user activity patterns and flags deviations in real time using machine learning.

Instead of rule-based detection, we use:

Behavioral Baselining + Statistical Anomaly Detection

What It Does

The system:

Generates realistic user activity logs
Builds per-user behavioral baselines
Detects anomalies using Isolation Forest
Assigns dynamic risk levels (Low / Moderate / High)
Stores alerts with feature-level explainability
Provides real-time dashboard visualization
Displays last 30-day user activity trends
Simulates both normal and attack behaviors
Uses a separate ML microservice for inference

Every login event passes through ML inference before being stored.

System Architecture

Technology Stack:

Frontend: React + Vite
Backend: Node.js + Express
ML Service: FastAPI + scikit-learn
Database: MongoDB Atlas

System Flow:

User Activity Event
→ Frontend (React)
→ Backend (Node.js/Express)
→ ML Microservice (FastAPI)
→ Backend (Risk Calibration + Storage)
→ MongoDB Atlas
→ Frontend Dashboard

Separation of concerns:

Backend handles orchestration and persistence
ML service handles statistical inference
Frontend handles visualization and UX

Feature Engineering

Each login event is transformed into a statistical feature vector.

Baseline Mean

$$ \mu_x = \frac{1}{n} \sum_{i=1}^{n} x_i $$

Baseline Standard Deviation

$$ \sigma_x = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu_x)^2} $$

Standardized Z-Score

$$ z = \frac{x - \mu}{\sigma} $$

Where:

$x$ = observed value
$\mu$ = baseline mean
$\sigma$ = baseline standard deviation

Binary Flags

New IP detection:

$$ \text{new_ip_flag} = 1 \quad \text{if IP not in trusted set} $$

$$ \text{new_ip_flag} = 0 \quad \text{otherwise} $$

New device detection:

$$ \text{new_device_flag} = 1 \quad \text{if device not recognized} $$

$$ \text{new_device_flag} = 0 \quad \text{otherwise} $$

Final Feature Vector

$$ X = \left[ z_{\text{login}}, z_{\text{files}}, z_{\text{download}}, \text{new_ip}, \text{new_device}, \text{sensitive_flag} \right] $$

Model

We use Isolation Forest for unsupervised anomaly detection.

Configuration

$$ \text{IsolationForest}(n_estimators = 100,\ contamination = 0.05) $$

Anomaly Score

$$ \text{score}(x) = \text{decision_function}(x) $$

Prediction Rule

$$ \text{prediction} = \begin{cases} -1 & \text{anomaly} \ 1 & \text{normal} \end{cases} $$

Risk levels are calibrated from anomaly scores into:

Low
Moderate
High

Normal vs Attack Simulation

Normal logs are sampled from the training distribution to maintain statistical alignment.

Attack simulations introduce:

Extreme z-score deviations
External IP addresses
Unknown devices
Elevated file and download activity
Sensitive access enabled

Each event is processed in real time by ML inference.

Challenges

1. Baseline Drift

Normal simulations initially generated anomalies due to distribution mismatch.

Solution:
Aligned runtime simulation with training dataset.

2. Score Calibration

Isolation Forest anomaly scores are relative and tightly clustered.

Solution:
Recalibrated thresholds to produce meaningful Low / Moderate / High segmentation.

3. Microservice Communication

Handled:

CORS issues
Environment variables
Production URLs
Cold start behavior

4. Statistical Sensitivity

Small standard deviations caused inflated z-scores.

Solution:
Refined variance scaling.

What’s Next

Real-Time Behavioral Drift

$$ \Delta_{\text{behavior}} = |\mu_{\text{current}} - \mu_{\text{baseline}}| $$

Planned enhancements:

Email notification system
Role-based risk modeling (IT / Finance / HR)
Rolling 30-day auto-retraining
Geo-location anomaly detection
Sequence modeling (LSTM)
Per-user risk trend graphs

Conclusion

This project demonstrates a statistically principled insider threat detection system built using:

Behavioral modeling
Unsupervised anomaly detection
Microservice ML inference
Real-time risk calibration

A scalable foundation for enterprise-grade behavioral security systems.

Built With

Updates

Akash Kumar started this project — Feb 28, 2026 11:25 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.