Raid Shield

Inspiration

In March 2026, Reddit turned off auto-ban and “guilt by association” in tools like SaferBot and Hive-Protect. A lot of big subs lost a major brigade defense overnight. I wanted something that still helps mods under pressure, but only from what happens on their own subreddit — no cross-sub history, no third-party blocklists. That’s Raid Shield: local signals, a clear ladder of responses, and dry-run by default so mods can watch it work before turning anything on.

What it does

Raid Shield watches a rolling five-minute window of posts and comments and scores four signals: posting speed vs a seven-day baseline, share of young accounts, near-duplicate text (simhash clusters), and the same link from several authors. Those roll into a 0–100 threat score and a four-stage ladder — alert, heightened monitoring, hold matching items for review, auto-remove — each with its own enforce toggle. Mods get a dashboard with live score, signal bars, incident cards, and a kill switch. New matching content can be held or removed only when enforcement is on; otherwise everything is logged as “would have done” in the audit trail.

How we built it

The app is Devvit Web (Hono server, React dashboard, Redis). Pure scoring and state-machine logic sit apart from Reddit/Redis I/O so most behavior is covered by fast unit tests (188 today). Triggers ingest posts and comments; a one-minute cron rescored incidents; signature matching runs on new items when something is already active. Config, enforcement flags, and thresholds live in Redis and are editable from the dashboard or a mod settings form.

Challenges we ran into

Unable to register apps in https://reddit.com/prefs/apps which is crucial for demo, since we are in mainland China, even using VPNs won't work, should the app be successfully created, malicious users can be simulated to throw raid posts and the raid_shield can pick that up.

Devvit limits. There’s no API to turn on sub slow-mode from the app, so Stage 2 sets a heightened-monitoring flag and nudges mods instead of flipping slow-mode itself.

Payload quirks. Account age isn’t in trigger payloads; we fetch it once per author and cache it. createdAt can be seconds or milliseconds, so we normalize that at ingest.

Gaps we had to close. onModAction started as a stub, so mod approvals didn’t stop repeat holds. The heightened flag was written but never read. Live matching ignored the configured simhash threshold. Fixing those meant an allowlist on approve, wiring the flag into matching and Stage 2 behavior, and reading config in the matcher.

Accomplishments that we're proud of A full path from detection → incident → dry-run or enforce → mod dashboard, with kill switch and per-stage toggles. The safety model is testable: kill switch × enforce × stage, plus a cap on auto-actions per incident. Dry-run on install is real, not marketing — mods see audit notes before they opt in. The repo is structured so detectors and transitions stay pure and the hackathon demo script (docs/RECORDING.md) matches how the app actually behaves.

What we learned Local-only signals can still catch coordinated raids if you combine velocity, account age, text similarity, and link repetition — especially with a small coincidence bonus when several fire at once. Hysteresis (sustained ticks to promote/demote) matters; without it, noisy spikes would flip stages constantly. Splitting pure logic from I/O early made the project easier to test and fix under time pressure. Playing it out on a real Devvit playtest sub (r/raid_shield_dev) surfaces things unit tests never will (modmail, webview, cron timing).

What's next for Raid Shield Hook up real slow-mode when Devvit exposes it. Tune thresholds per sub from dashboard history. Optional: LLM pass for ambiguous clusters (harassment/hate speech) via HTTP, kept off v1 so scoring stays explainable. Longer term: clearer appeal flow and richer mod-feedback (e.g. “approve cluster” learning from onModAction) without crossing the no-cross-sub policy line.

Built With

typescript

Updates

Mark Kuang started this project — May 27, 2026 08:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.

Chen Yang posted an update — May 24, 2026 05:20 AM EDT

Project Goals Detect and intercept coordinated "brigade" behavior targeting a specific subreddit in a timely manner to reduce community harassment, spam, and personal attacks while avoiding collateral damage to normal active users. Use only local subreddit signals (posts, comments, votes, user history and temporal patterns), without relying on cross-subreddit data or third-party user profiles, to protect privacy and simplify deployment. Provide a four-stage response ladder (all stages default to dry-run/simulation) that ranges from silent monitoring to progressively restricting interaction up to automatic temporary bans, allowing moderators to enable automated responses as needed. Fill the gap in SaferBot’s auto-ban policy as a configurable supplement or alternative, with transparent logging and an audit trail. Implementation Overview Signal Collection (local only) Stream new posts and comments metadata in real time (author, timestamp, text, context, parent comment/post, subreddit tags). Collect time series for votes and report events. Aggregate each user’s activity trajectory within the sub over the past N days/hours (post/comment frequency, deletion/being-deleted records, report counts, interaction patterns with target posts). Feature Engineering (per-sample and aggregated) Text features: duplicate/near-duplicate text, similar linked domains, rapid repetition of identical/similar comment templates, mention/@ patterns. Temporal/rhythm features: many accounts active simultaneously within a short window, sudden comment spikes, abnormal growth rates within a time window. User similarity: multiple accounts exhibiting similar writing style (n-grams, embedding distance), similar account creation/activity time distributions. Graph/network features: abnormal parallel branches in comment/reply trees, centralized voting behavior (large bursts of up/down votes). Reputation metrics (subreddit-local only): historical bans/deletion ratio, post retention rate, history of moderator flags. Detection Model and Rules Hybrid architecture: rules engine + lightweight ML model (e.g., small gradient-boosted trees or logistic regression) for scoring; prioritize interpretability, using explainable features and thresholds. Staged confidence scoring: compute an associated risk score (0–100) for each event (post/comment/user group) and map it to the four-stage response ladder. Continuous learning: adjust thresholds using aggregated anonymous statistics only after moderator review and explicit labeling (no importing of external user identifiers). Four-Stage Response Ladder (dry-run by default) Observe (logging and alerts, no user-visible actions). Visibility restriction (e.g., reduce post ranking or add a collapsed warning to suspicious comments; simulated only—no actual change, record expected impact). Interaction restriction (limit commenting/voting or temporarily block new comments; dry-run records what API calls would have been made). Automatic temporary bans (short-term bans/mutes) with an auditable reason and evidence package. Moderators can configure enable switches, thresholds, and action windows per stage; by default all stages simulate and record what would have happened, plus a differentiated impact assessment. Auditability and Transparency Store each trigger record with the triggering features, related user list, confidence score, recommended action, and simulated/real execution result. Provide a moderator review panel: filter events, bulk confirm/reverse actions, and feed back results for model/threshold tuning. Export evidence packages for human review (time series, raw text, similarity matrices). Testing and Evaluation Build dry-run test sets: historical event replay, synthetic brigade scenarios, and diverse benign activity cases. Metrics: detection recall/precision, false-positive rate (collateral damage to normal users), average response time, moderator intervention volume, simulated vs actual impact comparison. Continuous A/B testing: enable automated responses on a small scale to evaluate real-world effects and adjust thresholds. Deployment and Privacy Considerations Run only within the sub’s servers or trusted hosting environments; databases store only anonymized or local identifiers (do not export cross-subreddit IDs). Log retention policies and moderator access controls ensure human-review chains remain compliant.

Log in or sign up for Devpost to join the conversation.