-
-
Snowflake stores forensic fingerprints across cases, enabling cross-case pattern detection via SQL.
-
Solana notarization provides immutable proof that cluster evidence hasn't been tampered with.
-
Chain's upload dashboard where investigators submit suspicious evidence for analysis.
-
Results showing cross-case matches, cluster ID, similarity scores, and matching images.
Inspiration
In 2024, a man was wrongfully arrested based on AI-generated "evidence" photos that prosecutors believed were real. As generative AI becomes increasingly sophisticated, the integrity of visual evidence in legal proceedings is under unprecedented threat.
We asked ourselves: What if law enforcement could instantly detect whether crime scene photos, surveillance footage, or alibi images were AI-generated, and trace them back to specific generators across multiple cases?
Chain was born from this question.
What it does
Chain is a forensic image authentication platform that:
- Fingerprints images using perceptual hashing, JPEG quantization tables, and FFT frequency analysis
- Clusters similar images to identify "generator families" - groups of images created by the same AI tool
- Detects cross-case evidence when fake images from the same generator appear in separate investigations
- Notarizes findings on the Solana blockchain for tamper-proof, court-admissible proof
How we built it
1. Image Ingestion Pipeline*
When an investigator uploads a suspicious image, our backend:
- Computes SHA-256 hash for unique identification
- Extracts perceptual hash (pHash) using a 16x16 DCT-based algorithm
- Parses JPEG quantization tables and hashes them for encoder fingerprinting
- Performs 2D FFT and calculates low/high frequency energy ratios
- Stores all features in Snowflake with case metadata
2. Clustering Engine
We implemented Union-Find (disjoint set) clustering:
- Pull all asset features from Snowflake
- Compute pairwise similarity for all image pairs
- Connect images exceeding similarity threshold (0.7)
- Group connected components into "generator families"
- Write cluster assignments back to Snowflake
3. Cross-Case Detection
SQL queries join cluster membership with case metadata to identify when the same AI generator produced evidence across multiple investigations—the key insight for linking serial offenders.
4. Blockchain Notarization
When an analyst confirms findings:
- Compute SHA-256 hash of all cluster member hashes (sorted)
- Sign and broadcast a Solana transaction containing this fingerprint
- Store transaction signature in Snowflake for later verification
- Anyone can verify the cluster hasn't been tampered with by recomputing the hash
5. Real-Time Reclustering
For demo purposes, uploading a new image triggers automatic reclustering—the system immediately shows if the new evidence matches known AI generators from prior cases
Architecture Overview
Frontend (React + Vite) → Backend (FastAPI/Python) → Snowflake (forensic data warehouse) ↳ Solana (blockchain notarization)
Why Snowflake Was Essential
Snowflake serves as our forensic evidence data warehouse. Rather than just being a database, it's the backbone of our cross-case intelligence:
| Capability | Why It Matters |
|---|---|
| Scalable similarity search | Real forensic labs process millions of images. Snowflake's columnar storage enables sub-second similarity queries across massive datasets |
| Multi-tenant case isolation | Each investigation (CASE_A, CASE_B, CASE_NEW) is logically separated while still enabling cross-case pattern detection |
| SQL analytics on image features | Our clustering algorithm uses SQL JOINs to compute pairwise similarity, and Snowflake handles the O(n²) complexity efficiently |
| Audit trail | Every query is logged. Chain of custody requires knowing who accessed what evidence and when |
Without Snowflake, we couldn't answer the critical question: "Has this AI generator been used in other cases?"
Why Solana Was Essential
Solana provides immutable, timestamped proof that our forensic findings existed at a specific moment, which is critical for court admissibility:
| Capability | Why It Matters |
|---|---|
| Tamper-proof notarization | Once a cluster's membership hash is on-chain, no one can alter which images were grouped together |
| Timestamped evidence | The blockchain proves when the analysis was performed, so that defense attorneys can't claim results were fabricated after the fact |
| Decentralized trust | Unlike a centralized database, Solana's proof doesn't depend on trusting a single institution |
| Sub-second finality | Solana confirms transactions in ~400ms, which is fast enough for real-time forensic workflows |
The combination is powerful: Snowflake stores the evidence, Solana proves it wasn't tampered with.
Feature Extraction Pipeline
Each uploaded image undergoes forensic fingerprinting:
- Perceptual Hash (pHash): Creates a 256-bit fingerprint robust to minor edits like cropping or compression
- JPEG Quantization Tables: Extracts the unique compression signature left by the encoding software
- FFT Energy Ratio: Computes the ratio of low-frequency to high-frequency energy to detect AI artifacts
These features let us detect images from the same generator even when they depict completely different scenes.
Similarity & Clustering
We compute pairwise similarity using a weighted combination:
S(a, b) = w₁ · (1 - pHash_distance) + w₂ · quant_match + w₃ · (1 - |fft_ratio_a - fft_ratio_b|)
Where:
- pHash_distance = normalized Hamming distance between perceptual hashes
- quant_match = 1 if quantization tables match, 0 otherwise
- fft_ratio = low/high frequency energy ratio
Images with similarity > 0.7 are connected, and Union-Find clustering groups them into generator families.
Challenges we ran into
- Solana Devnet Quirks: The memo program wasn't available on devnet, forcing us to pivot to system transfers for notarization proof
- Snowflake Case Sensitivity: Unquoted SQL identifiers are uppercase in Snowflake, which caused hours of debugging
invalid identifiererrors - Clustering Threshold Tuning: Too low = everything clusters together; too high = no cross-case matches. We settled on 0.7 after extensive testing with various test images.
- Real-time Reclustering: Balancing accuracy with speed when new evidence arrives mid-investigation
What we learned
- Forensic image analysis is a deep field—perceptual hashing alone isn't enough; you need multiple orthogonal features
- Blockchain isn't just for crypto—it's genuinely useful for creating tamper-proof audit trails
- Snowflake's SQL engine is surprisingly powerful for ML-adjacent workloads like similarity computation
- The legal system needs technical solutions to keep pace with AI-generated misinformation
In a world where seeing is no longer believing, Chain ensures that forensic truth has a provable foundation.
What's next for Chain
Built With
- css
- fastapi
- imagehash
- numpy
- pil
- python
- react
- snowflake
- solana
- solders
- tailwind
- typescript
- vite

Log in or sign up for Devpost to join the conversation.