Inspiration
A 2024 study found that poisoning as little as 0.1% of a training dataset is enough to corrupt an AI model's behaviour entirely. That number stuck with me. One image in a thousand is all it takes. Images flow into AI pipelines constantly through web scraping, user uploads, and dataset aggregation, and nobody is scanning them for threats. I kept reading about visual prompt injection attacks on vision models and realised there was no accessible tool that treated images as a potential attack vector rather than passive data. That gap was the starting point for ImageShield.
What it does
ImageShield scans any image file for five threat classes: LSB steganography, malicious metadata payloads, adversarial perturbations, visual prompt injection, and polyglot file attacks. Every scan returns a risk score from 0 to 100 with per-module breakdowns, flag explanations, and a threat verdict of Safe, Suspicious, or Malicious. It also supports batch scanning, safe-image ZIP export, and an accuracy calibration tool that lets teams cross-reference scan results against a labeled ground truth dataset to measure detection performance over time.
How I built it
The backend is a FastAPI server running four local statistical detection modules built with NumPy and Pillow steganography detection via LSB spatial correlation analysis, metadata pattern matching, adversarial detection via FFT Nyquist corner energy analysis, and polyglot file scanning. A fifth module sends each image to Google Gemini 2.5 Flash with a structured prompt to detect visual prompt injection using the same visual understanding a target AI would have. All five scores feed into a weighted ensemble that produces the final risk index. The frontend is React with custom Canvas animations and no UI framework.
Challenges I ran into
Steganography detection was the hardest problem. Our initial chi-square LSB analysis produced near-identical scores for clean images and stego images because JPEG compression pre-randomises the LSB plane through DCT quantisation, making it statistically indistinguishable from embedded data. I iterated through RS analysis before landing on spatial correlation analysis, which gave us a measurable and reliable signal on lossless PNG sources. Calibrating five independent modules to produce a coherent composite score without systematic false positives took significant iteration -- early versions flagged clean PNGs as malicious because natural PNG LSB distributions look suspicious to naive detectors.
Accomplishments that I'm proud of
Getting visual prompt injection detection to work reliably using Gemini as a semantic layer was the result I am most proud of. It is genuinely difficult to detect instructions hidden at low opacity or in near-invisible colour combinations, and using a VLM to catch what other VLMs would be vulnerable to felt like the right architectural decision. I am also proud of the calibration tool, and the idea that a security team can maintain a labeled threat dataset and continuously benchmark the scanner against it turns ImageShield from a one-time tool into an auditable, improving security control.
What I learned
JPEG and PNG are fundamentally different from a security analysis perspective. Lossy compression does not just reduce file size. It destroys forensic information in ways that make spatial-domain steganalysis unreliable. I also learned that weighted ensemble scoring is much harder to calibrate than it looks: the interaction between module weights, activation thresholds, and the amplification factor for multi-module co-firing creates non-obvious emergent behaviour that requires systematic testing with labeled data to get right. And the 0.1% poisoning threshold reframed how I thought about the problem as the bar for a meaningful attack is far lower than most people assume.
What's next for ImageSHIELD
DCT-domain steganography detection for JPEG sources is the most important technical gap to close. After that: a lightweight on-device vision model as a fallback for prompt injection detection when the Gemini API is unavailable, a webhook system for embedding the scanner as upload middleware in existing platforms, and a Python SDK that makes integration a two-line change for any FastAPI or Django application.
Built With
- ai-security
- computer-vision
- cybersecurity
- fastapi
- fft
- gemini-2.5-flash
- google-gemini
- html5
- javascript
- lsb-analysis
- machine-learning
- numpy
- pillow
- python
- react
- rest-api
- steganography
- vite
Log in or sign up for Devpost to join the conversation.