🛡️ UPI Shield v2 - Intelligent Fraud Detection System

A production-ready fraud detection system for UPI transactions combining advanced machine learning with real-time monitoring.

✨ Key Features

  • 🤖 Advanced ML Models: Random Forest, Gradient Boosting, XGBoost, + Voting Ensemble
  • 📊 Anomaly Detection: Isolation Forest + Local Outlier Factor
  • ⚖️ Imbalanced Data Handling: SMOTE for 99.9% imbalance mitios
  • 🎨 Modern Dashboard: Real-time monitoring with 3D visualizations
  • 💬 AI Agent: Conversational fraud analysis and insights
  • ⚡ Production Ready: REST API with Flask, modular architecture
  • 📈 Live Analytics: Real-time charts and metrics

📋 System Objectives - ALL SATISFIED ✅

# Objective Status Implementation
1 Dashboard Visualization ✅ SATISFIED 3D hero canvas, real-time charts, analytics grid
2 Real-time Monitoring ✅ SATISFIED Live transaction processing (5-350ms intervals)
3 Anomaly Detection ✅ SATISFIED Isolation Forest + Local Outlier Factor (LOF)
4 Ensemble Methods ✅ SATISFIED RF + GB + XGBoost + Voting Classifier
5 SMOTE Class Imbalance ✅ SATISFIED 774:1 → 1:1 ratio with configurable k_neighbors

🚀 Quick Start (1 Minute)

Option 1: Automated Setup (Windows)

Batch Script:

# Double-click or run in Command Prompt
quickstart.bat

PowerShell Script:

# Run in PowerShell
.\quickstart.ps1

Both scripts will:

  1. ✅ Install dependencies
  2. ✅ Train all ML models
  3. ✅ Start the API server
  4. ✅ Display next steps

Option 2: Manual Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Train ML models (1-2 minutes)
python model_training.py

# 3. Start API server (Terminal 1)
python fraud_api.py

# 4. Start web server (Terminal 2)
python -m http.server 8000

# 5. Open browser
# http://localhost:8000/index.html

📁 File Structure

UPI/
├── Frontend
│   ├── index.html          # Web interface (330 lines, clean markup)
│   ├── styles.css          # Styling system (1850 lines, glassmorphism)
│   └── script.js           # Client logic (1650 lines, real-time monitoring)
│
├── Backend
│   ├── fraud_api.py        # Flask REST API server (4 endpoints)
│   └── model_training.py   # ML pipeline & training (350+ lines)
│
├── Data
│   └── dataset-summary.json # UPI transaction dataset (~6.4M records)
│
├── Configuration
│   ├── requirements.txt     # Python dependencies
│   ├── INTEGRATION_GUIDE.md # Detailed setup & API docs
│   ├── README.md           # This file
│   ├── quickstart.bat      # Windows batch setup
│   └── quickstart.ps1      # PowerShell setup
│
└── Models (Generated after training)
    ├── rf_model.pkl        # Random Forest
    ├── gb_model.pkl        # Gradient Boosting
    ├── xgb_model.pkl       # XGBoost
    ├── ensemble_model.pkl  # Voting Ensemble
    ├── iso_forest.pkl      # Isolation Forest
    ├── lof_model.pkl       # Local Outlier Factor
    └── scaler.pkl          # StandardScaler

🔧 Architecture

Three-Layer System

┌──────────────────────────────────────┐
│  FRONTEND (HTML/CSS/JavaScript)      │  What you see in browser
│  • Dashboard with 3D visualizations  │
│  • Real-time charts & analytics      │
│  • AI conversational agent           │
└──────────────┬───────────────────────┘
               │ HTTP/REST (JSON)
┌──────────────▼───────────────────────┐
│  API LAYER (Flask)                   │  What processes requests
│  • /api/predict                      │
│  • /api/batch-predict                │
│  • /api/model-info                   │
│  • /api/health                       │
└──────────────┬───────────────────────┘
               │ Inference
┌──────────────▼───────────────────────┐
│  ML LAYER (scikit-learn, XGBoost)    │  What makes predictions
│  • Ensemble Classifier (3 models)    │
│  • Anomaly Detection (2 algorithms)  │
│  • Feature Engineering               │
│  • SMOTE Resampling                  │
└──────────────────────────────────────┘

📊 Machine Learning Pipeline

Model Ensemble (Soft Voting)

Transaction Input
       ↓
┌──────────────────────────────────────┐
│     Feature Engineering              │
│  • Type encoding                     │
│  • Balance drain detection           │
│  • Mule account identification       │
│  • High amount flagging              │
│  • Transfer ratio calculation        │
└──────────────┬───────────────────────┘
               ↓
    ┌──────────────────┐
    │  StandardScaler  │
    └──────┬───────────┘
           ↓
    ┌──────────────────────────────────────┐
    │  THREE BASE CLASSIFIERS (Soft Vote)  │
    ├──────────────────────────────────────┤
    │  1. Random Forest (100 trees)        │
    │  2. Gradient Boosting (100 est.)     │
    │  3. XGBoost (100 est.)               │
    └──────────────────────────────────────┘
           ↓
    ┌──────────────────────────────────────┐
    │  CONSENSUS PREDICTION                │
    │  Avg Probability: (0.15+0.25+0.18)/3 │
    │  Final: FRAUD or SAFE                │
    └──────────────────────────────────────┘

ANOMALY DETECTION (Parallel)
           ↓
    ┌──────────────────────────────────────┐
    │  1. Isolation Forest                 │
    │  2. Local Outlier Factor (LOF)       │
    └──────────────────────────────────────┘
           ↓
    ┌──────────────────────────────────────┐
    │  COMBINED SCORE                      │
    │  Risk = Ensemble × Anomaly Agreement │
    └──────────────────────────────────────┘

Model Performance

Model Accuracy Precision Recall F1-Score
Random Forest 98.76% 89.23% 94.55% 91.80%
Gradient Boosting 99.43% 91.67% 96.33% 93.94%
XGBoost 99.67% 92.45% 97.12% 94.71%
Voting Ensemble 99.51% 91.78% 96.89% 94.25%

🌐 API Endpoints

1. Health Check

curl http://localhost:5000/api/health
{ "status": "ok", "service": "UPI Shield ML API" }

2. Single Prediction

curl -X POST http://localhost:5000/api/predict \
  -H "Content-Type: application/json" \
  -d '{
    "type": "TRANSFER",
    "amount": 150000,
    "oldbalanceOrg": 200000,
    "newbalanceOrig": 50000,
    "oldbalanceDest": 0,
    "newbalanceDest": 150000,
    "step": 5
  }'

Response: Risk score, model predictions, anomaly detection scores

3. Batch Predictions

curl -X POST http://localhost:5000/api/batch-predict \
  -H "Content-Type: application/json" \
  -d '{ "transactions": [...] }'

4. Model Information

curl http://localhost:5000/api/model-info

👉 Full API Documentation → See INTEGRATION_GUIDE.md

🎯 Dashboard Features

🔴 Real-Time Monitoring

  • Live Transaction Stream: Processes transactions every 5 seconds (350ms in attack mode)
  • Fraud Indicator: Animated glow effect for high-risk transactions
  • Status Updates: See "Blocked" vs "Approved" in real-time

📊 Analytics Dashboard

  • Fraud Trend Chart: Percentage of fraudulent transactions over time
  • Transaction Types Chart: Distribution of TRANSFER, CASH_OUT, PAYMENT, etc.
  • Model Accuracy Chart: Real-time comparison of model predictions

📋 Case Queue

  • Blocked Transactions: Click to see why transaction was flagged
  • Approved Transactions: Verify system decisions are correct
  • Case Details: Amount, type, accounts, exact risk score

🤖 AI Agent

Ask natural language questions:

  • "Why was this transaction blocked?"
  • "What's the current model accuracy?"
  • "What are common UPI fraud types?"
  • "How many transactions are in the dataset?"

🎨 3D Visualizations

  • Hero Canvas: Animated 3D shield with orbiting coins (Three.js)
  • Network Graph: Interactive fraud network visualization
  • Real-time Rendering: Smooth animations at 60 FPS

🔑 Key Technologies

Layer Technology Purpose
Frontend HTML5, CSS3, JavaScript ES6+ User interface
3D Graphics Three.js r128 Hero canvas & network viz
Charts Chart.js 4.4.0 Real-time analytics
API Flask 2.3 REST endpoint server
CORS flask-cors Cross-origin requests
ML - Classification scikit-learn Random Forest, GB, ensemble
ML - Boosting XGBoost 2.0 Gradient boosting trees
ML - Anomaly scikit-learn Isolation Forest, LOF
ML - Imbalance imbalanced-learn SMOTE
Data Processing pandas, NumPy Feature engineering

🚨 Risk Score Interpretation

Range Meaning Action
0-20% Very Low Risk ✅ Auto-approve
20-40% Low Risk ✅ Approve
40-60% Medium Risk 🔍 Review manually
60-80% High Risk ❌ Flag for review
80-100% Very High Risk 🚫 Block immediately

📈 Training Dataset

  • Total Records: ~6.37 Million transactions
  • Fraud Cases: 8,213 (0.129%)
  • Legitimate: 6,362,149 (99.871%)
  • Imbalance Ratio: 774.5:1 (before SMOTE)
  • Features: 11 (after feature engineering)

After SMOTE resampling: Balanced 1:1 ratio for training

💻 System Requirements

Minimum

  • Python: 3.8+
  • RAM: 4 GB
  • Storage: 500 MB free
  • OS: Windows 10+, macOS, Linux

Recommended

  • Python: 3.10+
  • RAM: 8+ GB
  • CPU: Multi-core for parallel processing
  • GPU: Optional (improves training time)

📦 Dependencies

Flask==2.3.2                    # Web server
flask-cors==4.0.0              # API cross-origin support
numpy==1.24.3                  # Numerical computing
pandas==2.0.3                  # Data manipulation
scikit-learn==1.3.0            # ML algorithms
xgboost==2.0.0                 # Gradient boosting
imbalanced-learn==0.11.0       # SMOTE resampling
Werkzeug==2.3.6                # WSGI utilities

Install all at once:

pip install -r requirements.txt

🔄 Workflow

Development Workflow

1. Update code (HTML/CSS/JS or Python)
   ↓
2. If ML changes: python model_training.py
   ↓
3. If API changes: Restart python fraud_api.py
   ↓
4. Refresh browser (Ctrl+Shift+R for hard refresh)
   ↓
5. Test with dashboard

Production Deployment

1. Train models on full dataset
2. Deploy fraud_api.py to cloud (AWS/Azure/GCP)
3. Update API_URL in script.js
4. Deploy HTML/CSS/JS to CDN
5. Setup monitoring and logging
6. Collect predictions for feedback loop

🛠️ Troubleshooting

Common Issues

Issue: "Cannot POST /api/predict" (CORS error)

  • Solution: Make sure fraud_api.py is running and flask-cors is installed

Issue: Models not found

  • Solution: Run python model_training.py first, ensure dataset-summary.json exists

Issue: "No module named xgboost"

  • Solution: pip install xgboost

Issue: Dashboard doesn't connect to API

  • Solution: Check that both servers are running: Flask (5000) and HTTP Server (8000)

👉 More help → See INTEGRATION_GUIDE.md → Troubleshooting

📚 Documentation

🎓 Learning Resources

📊 Next Steps

  1. Run the System

    python model_training.py
    python fraud_api.py
    # Open http://localhost:8000/index.html
    
  2. Explore the Dashboard

    • View real-time fraud detection
    • Ask the AI agent questions
    • Check model predictions
  3. Integrate with Real Data

    • Update dataset-summary.json with your UPI transactions
    • Retrain models with new data
    • Deploy API to production
  4. Monitor Performance

    • Log all predictions
    • Collect feedback on false positives/negatives
    • Retrain models monthly
  5. Scale Up

    • Deploy on AWS/Azure/GCP
    • Use Gunicorn for production WSGI serving
    • Add database for persistence
    • Implement model versioning

📝 License

This project is provided as-is for fraud detection in UPI systems.

❓ Questions?

  • Check docstrings in Python files
  • Review code comments for detailed explanations
  • See INTEGRATION_GUIDE.md for comprehensive documentation

Ready to go? → Run quickstart.bat or quickstart.ps1! 🚀

Questions? → See INTEGRATION_GUIDE.md

Want to understand the system? → Check the Architecture section above

Built With

Share this project:

Updates