Raid Shield

Chen Yang posted an update — May 24, 2026 05:20 AM EDT

Project Goals Detect and intercept coordinated "brigade" behavior targeting a specific subreddit in a timely manner to reduce community harassment, spam, and personal attacks while avoiding collateral damage to normal active users. Use only local subreddit signals (posts, comments, votes, user history and temporal patterns), without relying on cross-subreddit data or third-party user profiles, to protect privacy and simplify deployment. Provide a four-stage response ladder (all stages default to dry-run/simulation) that ranges from silent monitoring to progressively restricting interaction up to automatic temporary bans, allowing moderators to enable automated responses as needed. Fill the gap in SaferBot’s auto-ban policy as a configurable supplement or alternative, with transparent logging and an audit trail. Implementation Overview Signal Collection (local only) Stream new posts and comments metadata in real time (author, timestamp, text, context, parent comment/post, subreddit tags). Collect time series for votes and report events. Aggregate each user’s activity trajectory within the sub over the past N days/hours (post/comment frequency, deletion/being-deleted records, report counts, interaction patterns with target posts). Feature Engineering (per-sample and aggregated) Text features: duplicate/near-duplicate text, similar linked domains, rapid repetition of identical/similar comment templates, mention/@ patterns. Temporal/rhythm features: many accounts active simultaneously within a short window, sudden comment spikes, abnormal growth rates within a time window. User similarity: multiple accounts exhibiting similar writing style (n-grams, embedding distance), similar account creation/activity time distributions. Graph/network features: abnormal parallel branches in comment/reply trees, centralized voting behavior (large bursts of up/down votes). Reputation metrics (subreddit-local only): historical bans/deletion ratio, post retention rate, history of moderator flags. Detection Model and Rules Hybrid architecture: rules engine + lightweight ML model (e.g., small gradient-boosted trees or logistic regression) for scoring; prioritize interpretability, using explainable features and thresholds. Staged confidence scoring: compute an associated risk score (0–100) for each event (post/comment/user group) and map it to the four-stage response ladder. Continuous learning: adjust thresholds using aggregated anonymous statistics only after moderator review and explicit labeling (no importing of external user identifiers). Four-Stage Response Ladder (dry-run by default) Observe (logging and alerts, no user-visible actions). Visibility restriction (e.g., reduce post ranking or add a collapsed warning to suspicious comments; simulated only—no actual change, record expected impact). Interaction restriction (limit commenting/voting or temporarily block new comments; dry-run records what API calls would have been made). Automatic temporary bans (short-term bans/mutes) with an auditable reason and evidence package. Moderators can configure enable switches, thresholds, and action windows per stage; by default all stages simulate and record what would have happened, plus a differentiated impact assessment. Auditability and Transparency Store each trigger record with the triggering features, related user list, confidence score, recommended action, and simulated/real execution result. Provide a moderator review panel: filter events, bulk confirm/reverse actions, and feed back results for model/threshold tuning. Export evidence packages for human review (time series, raw text, similarity matrices). Testing and Evaluation Build dry-run test sets: historical event replay, synthetic brigade scenarios, and diverse benign activity cases. Metrics: detection recall/precision, false-positive rate (collateral damage to normal users), average response time, moderator intervention volume, simulated vs actual impact comparison. Continuous A/B testing: enable automated responses on a small scale to evaluate real-world effects and adjust thresholds. Deployment and Privacy Considerations Run only within the sub’s servers or trusted hosting environments; databases store only anonymized or local identifiers (do not export cross-subreddit IDs). Log retention policies and moderator access controls ensure human-review chains remain compliant.

Log in or sign up for Devpost to join the conversation.