🕵️ Image Detective
A subreddit-specific reverse image search and repost detection for Reddit — built entirely on Devvit. A sequel to Image Sourcery, designed to work as a pure Devvit app with no external servers or services.
Features
- 🔍 Real-Time Repost Detection — Automatically scans every new image post on submission and reports likely duplicates to the mod queue.
- 📂 Historical Backfill — Add historical posts to the database: top posts from configurable timeframes, or recent posts from the last N days.
- 🔎 On-Demand Search — Both moderators and regular users can use the "Find Similar Images" menu action on any post to search the database for visually similar images.
- 🖼️ Gallery Support — Processes Reddit image galleries, extracting and hashing each image individually.
- 📊 Detailed Reporting — Sends ModMail reports after backfill operations with statistics, match examples at various similarity tiers, and failed image summaries.
- ⚙️ Configurable — Adjustable similarity thresholds, karma/age gating for originals, removal status filtering, and more — all from the app settings page.
How It Works (The Short Version)
- A new image post is submitted.
- The app fetches the image data (preferring small thumbnails for speed).
- A 64-bit perceptual hash is computed — a mathematical fingerprint of the image's visual structure.
- The hash is compared against the existing database using Locality-Sensitive Hashing for fast approximate nearest-neighbor search.
- If a match above the configured similarity threshold is found, the post is reported to the mod queue.
- The new hash is stored in the database for future comparisons.
Only the hash is stored (e.g. bbde1db18de920a2) — no actual images are saved.
Settings
All settings are configurable from the app's settings page on Reddit. They are grouped into two sections:
Repost Detection
| Setting | Default | Description |
|---|---|---|
| Action on Repost Detection | None (log only) | What happens when a repost is detected: log only, or report the post. |
| Ignore Removed Posts | (none selected) | Optionally ignore reposts whose originals were removed by mods, deleted by author, removed by Reddit, or filtered by AutoMod/spam. |
| Similarity Threshold | 95% (3 bits) | Minimum visual similarity for a match. 100% = exact, lower values catch more but risk false positives. |
| Min Karma of Original | 0 (disabled) | Only flag reposts if the original post has at least this much karma. |
| Max Age of Original | 0 (disabled) | Only flag reposts if the original was posted within the last N days. |
| Threshold Mode | Either | Whether karma and age thresholds must both be met, or just one. |
Backfill
| Setting | Default | Description |
|---|---|---|
| Record posts from last X days | 0 (disabled) | Fetch recent image posts from the last N days (max 30). |
| Add X top posts | 0 (disabled) | Fetch the top N image posts (max 100). |
| Top Posts Timeframe | All Time | Which timeframes to pull top posts from (All Time, Year, Month, Week — multi-select). |
A Bit More Technical
Fetch
All image fetching happens through Devvit's built-in fetch API. The app
prefers thumbnails over full-size images whenever possible — they're faster to fetch, use less memory, and produce consistent hashes.
Redis and Data
The app uses Devvit's Redis for all persistent storage. No external databases or services are needed and no post data is stored other than the image hashes.
Key data structures:
| Key Pattern | Type | Purpose |
|---|---|---|
hash |
Hash map | Global registry of all canonical perceptual hashes |
posts:{hash} |
Sorted set | Maps each hash to the post IDs that produced it (scored by creation time) |
hashes:{postId} |
Hash map | Reverse index — maps each post to its computed hashes |
chunk:{band}:{chunk} |
Hash map | LSH band buckets for fast candidate lookup |
chunkIndex:{band} |
Hash map | Tracks all chunks in a band (for cleanup) |
queue:images |
Sorted set | The main processing queue (JSON-encoded job payloads) |
queue:active_processing_images |
Sorted set | Items currently being processed (for crash recovery) |
queue:retry |
Sorted set | Failed images pending retry |
queue:failed |
Sorted set | Permanently failed images (max retries exceeded) |
stats:* |
Various | Processing statistics by category and outcome |
blacklist |
Sorted set | Blacklisted image URLs or hashes |
The reverse index (hashes:{postId}) enables cheap O(1) lookups when checking if a post has already been indexed, and avoids re-hashing on subsequent searches.
Hash Generation
The app uses perceptual hashing (pHash) — a technique that produces a compact fingerprint representing an image's visual structure rather than its raw pixel data.
Two hashes are compared using Hamming distance — the number of differing bits. A distance of 0 means identical; the maximum is 64 (completely different).
Hash Lookup (LSH)
Naively comparing every new hash against every stored hash would be too slow
for large subreddits. Instead, the app uses Locality-Sensitive Hashing (LSH) to reduce this to a small, constant-time candidate set. The 16-character hex hash is split into 7 overlapping bands of 4 characters each (sliding window with stride 2). Each band is stored in a Redis hash map keyed by chunk:{band}:{chunk_value}. To search:
- Compute the 7 bands of the query hash.
- Look up each band in Redis to find all stored hashes that share that band.
- Collect the union of candidates (typically a small fraction of the total database).
- Verify each candidate with a full Hamming distance check against the threshold.
This means only hashes sharing at least one 4-character substring need to be checked — in practice, the candidate set is small enough to make lookups near-instantaneous regardless of database size.
Workers and Queue
Image processing is inherently slow, so the app uses an asynchronous job queue backed by Redis sorted sets and Devvit's scheduler.
- Job payloads are JSON objects containing everything needed to process an image (post ID, URL, category, fallback URL, etc.) — no additional API calls are needed during processing.
- A scheduler job (
process_image_queue) runs in batches of up to 4 images. Retry items are isolated to batches of 1 to prevent OOM crashes from cascading. - Dual-lock kickoff: When a new post triggers processing, the app schedules both a frontline worker (+2s) and a backup watchdog (+60s) using Redis TTL locks to prevent duplicate scheduling.
- Upfront rescheduling: The next worker is scheduled before processing starts, so if the current worker crashes (e.g., OOM), the queue isn't orphaned.
- Active Processing Vault: Items are copied to a separate sorted set before being removed from the main queue. If a worker dies mid-processing (detected after 60s), the recovery system re-queues lost items.
- Retry logic: Failed images get up to 3 retries. Permanent failures (OOM, size limits) skip retries entirely and go straight to the ignored queue.
- V8 GC yield: A 100ms pause between images forces garbage collection of large decoded buffers, preventing memory leak crashes.
Backfill / Subreddit Menu
Moderators can build the initial image database through the "Image Detective: Mod Actions" menu (available on the subreddit level). Options include:
- Start Backfill — Uses the configured backfill settings (days + top posts + timeframes) to scan historical posts.
- Test Backfill (subreddit) — Backfill a specific subreddit using the current settings.
- Test Backfill (post IDs) — Backfill specific posts by comma-separated IDs.
- Wipe Database — Completely clears all hashes, LSH indexes, queues, stats, and locks.
When a backfill starts, a ModMail notification is sent with the estimated processing time. When the queue finishes, a detailed completion report is sent as a ModMail reply with:
- Total unique hashes indexed, images processed, and posts scanned
- A list of any failed/ignored images with reasons
- Examples of potential matches grouped by similarity tier (100%, 96–99%, 92–95%, 88–91%) so mods can calibrate their threshold setting
Post Menu / Find Similar
Two menu actions are available on individual posts:
- 🔍 Find Similar Images (all users) — Searches the database for visually similar images. Results are filtered to exclude removed/deleted/spam posts. Uses the subreddit's configured similarity threshold.
- 🔍 Find Similar Images (Mod) (moderators only) — Same search, but shows all results and uses a broader threshold for comprehensive coverage.
If the post hasn't been indexed yet, it's queued for processing and the user is prompted to try again shortly. If it has been indexed, results are displayed immediately in a form with clickable links to matching posts, including similarity percentages and gallery image indicators.
Triggers
The app registers the following Devvit triggers:
| Trigger | Purpose |
|---|---|
PostSubmit |
Fires on every new post. Extracts image URLs, queues hash jobs, and kicks off the worker. This is the live scanner. |
Log in or sign up for Devpost to join the conversation.