🔐 Inspiration
Generative AI has made fake images nearly indistinguishable from real ones—threatening trust in journalism, digital identity, and art.
Most existing solutions act as black boxes, giving predictions without proof. I wanted to build something different: a system that doesn’t just detect AI-generated images, but proves it.
🧠 What it does
I built VIPER (Visual Intelligence Pipeline for Empirical Recognition)—a forensic AI engine that acts like a lie detector for images.
VIPER combines:
- Deep learning (ConvNeXt) for visual understanding
- Mathematical signal analysis (FFT, PRNU, color entropy) for hard evidence
This allows the system to achieve 96%+ accuracy while also explaining why an image is classified as AI or real.
I frame this as an Art Heist scenario, where users analyze artwork and decide whether it’s authentic or an AI-generated forgery—guided by forensic insights.
⚙️ How I built it
Data Pipeline:
- Processed a large-scale dataset (~67k images), sub-sampled to 10,000 balanced samples
- Enforced strict 70/15/15 train/validation/test splits to prevent data leakage
- Applied data augmentation (flips, color jitter) for robustness
- Optimized with PyTorch DataLoaders (batch size 64)
Forensic Feature Engineering
Before deep learning, I extracted 27 mathematical features across six domains:
- FFT (frequency analysis) → detects unnatural smoothness
- PRNU noise patterns → identifies missing camera sensor fingerprints
- Color entropy (LAB space) → flags overly “perfect” color distributions
- Texture (GLCM) and edge gradients → measure structural realism
These are standardized and grouped into human-readable signals:
- FFT Irregularity
- PRNU Variation
- LAB Saturation
Baseline Model:
- I trained a Logistic Regression model on these features (~71% accuracy) to establish a statistical baseline and validate feature importance.
Deep Learning Engine:
- Used ConvNeXt-Tiny (transfer learning) pretrained on ImageNet
- Froze lower layers and added a custom classification head
Two modes:
- Standard: image embeddings only
- Hybrid: embeddings + forensic features
Training optimizations:
- Adam optimizer
- Cosine annealing learning rate
- Validation after every epoch
- Automatic checkpointing to prevent overfitting
Evaluation:
- Achieved 96%+ accuracy and F1 score
- Evaluated on strictly unseen test data
- Generated confusion matrix, precision/recall, and AUC Interpretability
To eliminate the “black box,” I added:
- Grad-CAM heatmaps → show which pixels influenced predictions
- UMAP projections → visualize separation between AI and real images
This makes VIPER a transparent forensic system, not just a classifier.
Real-World Testing:
- JPEG compression tests → evaluate robustness to degraded images
- Zero-shot testing (WikiArt) → evaluate generalization to unseen data
🎯 Results
96%+ F1 Score Strong generalization across datasets Clear clustering between AI and real images Interpretable outputs with visual and mathematical justification
🧩 Challenges I ran into
- Scaling forensic feature extraction across thousands of images
- Balancing deep learning with handcrafted signals
- Preventing overfitting while maximizing performance
- Making complex outputs understandable for users
🏆 Accomplishments that I'm proud of
- Turning a black-box model into a transparent forensic system
- Successfully combining signal processing + deep learning
- Achieving high accuracy and being able to explain how and why
- Building a complete, end-to-end pipeline from raw data to insights
🚀 What’s next for VIPER
- Scale to full 67k+ dataset with optimized feature extraction
- Deploy as a real-time API for journalists and investigators
- Improve robustness against adversarial AI-generated content
💡 Final Thoughts
VIPER isn’t just a classifier—it’s a forensic intelligence system that brings trust and transparency back to digital imagery.
Log in or sign up for Devpost to join the conversation.