Inspiration

In 2024, a man was wrongfully arrested based on AI-generated "evidence" photos that prosecutors believed were real. As generative AI becomes increasingly sophisticated, the integrity of visual evidence in legal proceedings is under unprecedented threat.

We asked ourselves: What if law enforcement could instantly detect whether crime scene photos, surveillance footage, or alibi images were AI-generated, and trace them back to specific generators across multiple cases?

Chain was born from this question.

What it does

Chain is a forensic image authentication platform that:

  1. Fingerprints images using perceptual hashing, JPEG quantization tables, and FFT frequency analysis
  2. Clusters similar images to identify "generator families" - groups of images created by the same AI tool
  3. Detects cross-case evidence when fake images from the same generator appear in separate investigations
  4. Notarizes findings on the Solana blockchain for tamper-proof, court-admissible proof

How we built it

1. Image Ingestion Pipeline*

When an investigator uploads a suspicious image, our backend:

  • Computes SHA-256 hash for unique identification
  • Extracts perceptual hash (pHash) using a 16x16 DCT-based algorithm
  • Parses JPEG quantization tables and hashes them for encoder fingerprinting
  • Performs 2D FFT and calculates low/high frequency energy ratios
  • Stores all features in Snowflake with case metadata

2. Clustering Engine

We implemented Union-Find (disjoint set) clustering:

  • Pull all asset features from Snowflake
  • Compute pairwise similarity for all image pairs
  • Connect images exceeding similarity threshold (0.7)
  • Group connected components into "generator families"
  • Write cluster assignments back to Snowflake

3. Cross-Case Detection

SQL queries join cluster membership with case metadata to identify when the same AI generator produced evidence across multiple investigations—the key insight for linking serial offenders.

4. Blockchain Notarization

When an analyst confirms findings:

  • Compute SHA-256 hash of all cluster member hashes (sorted)
  • Sign and broadcast a Solana transaction containing this fingerprint
  • Store transaction signature in Snowflake for later verification
  • Anyone can verify the cluster hasn't been tampered with by recomputing the hash

5. Real-Time Reclustering

For demo purposes, uploading a new image triggers automatic reclustering—the system immediately shows if the new evidence matches known AI generators from prior cases

Architecture Overview

Frontend (React + Vite) → Backend (FastAPI/Python) → Snowflake (forensic data warehouse)   ↳ Solana (blockchain notarization)

Why Snowflake Was Essential

Snowflake serves as our forensic evidence data warehouse. Rather than just being a database, it's the backbone of our cross-case intelligence:

Capability Why It Matters
Scalable similarity search Real forensic labs process millions of images. Snowflake's columnar storage enables sub-second similarity queries across massive datasets
Multi-tenant case isolation Each investigation (CASE_A, CASE_B, CASE_NEW) is logically separated while still enabling cross-case pattern detection
SQL analytics on image features Our clustering algorithm uses SQL JOINs to compute pairwise similarity, and Snowflake handles the O(n²) complexity efficiently
Audit trail Every query is logged. Chain of custody requires knowing who accessed what evidence and when

Without Snowflake, we couldn't answer the critical question: "Has this AI generator been used in other cases?"

Why Solana Was Essential

Solana provides immutable, timestamped proof that our forensic findings existed at a specific moment, which is critical for court admissibility:

Capability Why It Matters
Tamper-proof notarization Once a cluster's membership hash is on-chain, no one can alter which images were grouped together
Timestamped evidence The blockchain proves when the analysis was performed, so that defense attorneys can't claim results were fabricated after the fact
Decentralized trust Unlike a centralized database, Solana's proof doesn't depend on trusting a single institution
Sub-second finality Solana confirms transactions in ~400ms, which is fast enough for real-time forensic workflows

The combination is powerful: Snowflake stores the evidence, Solana proves it wasn't tampered with.

Feature Extraction Pipeline

Each uploaded image undergoes forensic fingerprinting:

  • Perceptual Hash (pHash): Creates a 256-bit fingerprint robust to minor edits like cropping or compression
  • JPEG Quantization Tables: Extracts the unique compression signature left by the encoding software
  • FFT Energy Ratio: Computes the ratio of low-frequency to high-frequency energy to detect AI artifacts

These features let us detect images from the same generator even when they depict completely different scenes.

Similarity & Clustering

We compute pairwise similarity using a weighted combination:

S(a, b) = w₁ · (1 - pHash_distance) + w₂ · quant_match + w₃ · (1 - |fft_ratio_a - fft_ratio_b|)

Where:

  • pHash_distance = normalized Hamming distance between perceptual hashes
  • quant_match = 1 if quantization tables match, 0 otherwise
  • fft_ratio = low/high frequency energy ratio

Images with similarity > 0.7 are connected, and Union-Find clustering groups them into generator families.

Challenges we ran into

  1. Solana Devnet Quirks: The memo program wasn't available on devnet, forcing us to pivot to system transfers for notarization proof
  2. Snowflake Case Sensitivity: Unquoted SQL identifiers are uppercase in Snowflake, which caused hours of debugging invalid identifier errors
  3. Clustering Threshold Tuning: Too low = everything clusters together; too high = no cross-case matches. We settled on 0.7 after extensive testing with various test images.
  4. Real-time Reclustering: Balancing accuracy with speed when new evidence arrives mid-investigation

What we learned

  • Forensic image analysis is a deep field—perceptual hashing alone isn't enough; you need multiple orthogonal features
  • Blockchain isn't just for crypto—it's genuinely useful for creating tamper-proof audit trails
  • Snowflake's SQL engine is surprisingly powerful for ML-adjacent workloads like similarity computation
  • The legal system needs technical solutions to keep pace with AI-generated misinformation

In a world where seeing is no longer believing, Chain ensures that forensic truth has a provable foundation.

What's next for Chain

Built With

Share this project:

Updates