Inspiration
In March 2026, Reddit turned off auto-ban and “guilt by association” in tools like SaferBot and Hive-Protect. A lot of big subs lost a major brigade defense overnight. I wanted something that still helps mods under pressure, but only from what happens on their own subreddit — no cross-sub history, no third-party blocklists. That’s Raid Shield: local signals, a clear ladder of responses, and dry-run by default so mods can watch it work before turning anything on.
What it does
Raid Shield watches a rolling five-minute window of posts and comments and scores four signals: posting speed vs a seven-day baseline, share of young accounts, near-duplicate text (simhash clusters), and the same link from several authors. Those roll into a 0–100 threat score and a four-stage ladder — alert, heightened monitoring, hold matching items for review, auto-remove — each with its own enforce toggle. Mods get a dashboard with live score, signal bars, incident cards, and a kill switch. New matching content can be held or removed only when enforcement is on; otherwise everything is logged as “would have done” in the audit trail.
How we built it
The app is Devvit Web (Hono server, React dashboard, Redis). Pure scoring and state-machine logic sit apart from Reddit/Redis I/O so most behavior is covered by fast unit tests (188 today). Triggers ingest posts and comments; a one-minute cron rescored incidents; signature matching runs on new items when something is already active. Config, enforcement flags, and thresholds live in Redis and are editable from the dashboard or a mod settings form.
Challenges we ran into
Unable to register apps in https://reddit.com/prefs/apps which is crucial for demo, since we are in mainland China, even using VPNs won't work, should the app be successfully created, malicious users can be simulated to throw raid posts and the raid_shield can pick that up.
Devvit limits. There’s no API to turn on sub slow-mode from the app, so Stage 2 sets a heightened-monitoring flag and nudges mods instead of flipping slow-mode itself.
Payload quirks. Account age isn’t in trigger payloads; we fetch it once per author and cache it. createdAt can be seconds or milliseconds, so we normalize that at ingest.
Gaps we had to close. onModAction started as a stub, so mod approvals didn’t stop repeat holds. The heightened flag was written but never read. Live matching ignored the configured simhash threshold. Fixing those meant an allowlist on approve, wiring the flag into matching and Stage 2 behavior, and reading config in the matcher.
Accomplishments that we're proud of A full path from detection → incident → dry-run or enforce → mod dashboard, with kill switch and per-stage toggles. The safety model is testable: kill switch × enforce × stage, plus a cap on auto-actions per incident. Dry-run on install is real, not marketing — mods see audit notes before they opt in. The repo is structured so detectors and transitions stay pure and the hackathon demo script (docs/RECORDING.md) matches how the app actually behaves.
What we learned Local-only signals can still catch coordinated raids if you combine velocity, account age, text similarity, and link repetition — especially with a small coincidence bonus when several fire at once. Hysteresis (sustained ticks to promote/demote) matters; without it, noisy spikes would flip stages constantly. Splitting pure logic from I/O early made the project easier to test and fix under time pressure. Playing it out on a real Devvit playtest sub (r/raid_shield_dev) surfaces things unit tests never will (modmail, webview, cron timing).
What's next for Raid Shield Hook up real slow-mode when Devvit exposes it. Tune thresholds per sub from dashboard history. Optional: LLM pass for ambiguous clusters (harassment/hate speech) via HTTP, kept off v1 so scoring stays explainable. Longer term: clearer appeal flow and richer mod-feedback (e.g. “approve cluster” learning from onModAction) without crossing the no-cross-sub policy line.
Log in or sign up for Devpost to join the conversation.