Reddit Sentinel AI — Project Story


💡 Inspiration

It started with a simple question: who protects the protectors?

Reddit has over 50,000 active volunteer moderators. They are unpaid. They are unrecognized. And according to Reddit's own 2023 mod survey, nearly 47% report burnout symptoms. Every day they wake up to hundreds of posts, coordinated spam raids, hate waves, and rule violations — with nothing but a basic queue and their own judgment to work with.

The existing tools tell mods what to remove. Nothing tells them what to look at first. Nothing enforces rules at 3am when the team is asleep. Nothing measures whether moderation is actually working.

I've watched subreddits I care about slowly degrade — not because mods stopped caring, but because the signal-to-noise ratio became impossible. Good moderators were quitting. Communities were suffering.

That was the moment I decided to build Reddit Sentinel AI — not as a gimmick, but as a genuine attempt to make the hardest volunteer job on the internet survivable.


🔨 How I Built It

The Core Philosophy: No Black Boxes

From day one, I committed to one hard constraint: every moderation decision must be explainable. No external AI APIs, no probabilistic black boxes. Every flag Sentinel raises comes with a plain-language reason a mod can read, understand, and override.

This led to the multi-signal deterministic classifier — the heart of the project.

The Classifier

The risk engine scores every post and comment across three weighted signal dimensions:

$$\text{RiskScore} = 0.40 \cdot S_{\text{content}} + 0.35 \cdot S_{\text{author}} + 0.25 \cdot S_{\text{behavioral}}$$

Where:

  • $S_{\text{content}}$ — keyword patterns, spam signals, toxicity heuristics
  • $S_{\text{author}}$ — account age, karma, prior violations
  • $S_{\text{behavioral}}$ — posting frequency, cross-sub patterns, repeat submission fingerprints

The score maps to five tiers:

Tier Score Action
🔴 CRITICAL 90–100 Auto-remove + modmail alert
🟠 HIGH 70–89 Priority queue for human review
🟡 MEDIUM 40–69 Queue with reasoning attached
🔵 LOW 10–39 Log only
🟢 CLEAN 0–9 No action

CRITICAL-tier removal fires in < 1 second — before any community member is exposed.

The Stack

Everything runs 100% inside Reddit via Devvit. The architecture uses:

  • Triggers (PostCreate, CommentCreate, ModAction) to intercept content the moment it lands
  • Redis for real-time queues, activity tracking, and deduplication
  • KV Store for persistent configuration — rules, watchlist, audit log
  • Scheduler (*/5 * * * *) for spike detection, health score updates, and modmail alerts
  • Custom Post for the Sentinel Dashboard — a full interactive command center built with Devvit Blocks + a Webview

No external API calls. No paid services. No data leaving Reddit's infrastructure.

The Dashboard

The dashboard was the most challenging UI surface. It needed to pack five distinct functional areas — live queue, analytics, rule builder, watchlist, and audit log — into a single persistent custom post that any mod could open and immediately understand.

I built it as a self-contained Webview (webroot/index.html) with a tab-based layout, color-coded risk cards, bulk action controls, and a no-code IF/THEN rule builder that lets mods configure automation without writing a single line of code.

The Rule Engine

Mods can build rules like:

IF riskScore > 70 AND accountAgeDays < 7
THEN remove + modmail "Crypto spam from new account"

The rule engine evaluates these against every classified item and executes actions atomically — remove, flair, modmail, watchlist — all without any human in the loop.


📚 What I Learned

1. Devvit's Redis is both powerful and delicate

Real-time queues with TTLs, atomic increments for analytics, and deduplication keys — Redis makes all of this possible inside Devvit. But you have to think carefully about key naming, expiry, and read-write patterns. I rewrote the storage layer twice before landing on typed wrappers (redisStore.ts, kvStore.ts) that made the rest of the codebase clean and safe.

2. Explainability is a feature, not a footnote

Every time I was tempted to use a heavier model for "better" classification, I reminded myself: a mod who doesn't trust a decision won't act on it. The deterministic classifier with plain-language explanations — "Flagged because: new account (2 days), low karma (3), spam pattern: 'buy now'" — built more trust than any accuracy percentage would.

3. Scheduling is underrated

The 5-minute scheduler that handles spike detection and health score updates turned out to be one of the most impactful features. Spam raids don't happen during business hours. Having Sentinel fire modmail alerts at 3am when a wave hits — with no mod online — is exactly the kind of invisible protection communities need.

4. Custom Posts are genuinely powerful

Devvit's custom post primitive is far more capable than it appears. Building a full interactive dashboard — with tabs, live data, bulk actions, and form inputs — inside a Reddit post was a genuinely new experience. The constraint of no external CDN calls forced creative solutions and kept the whole thing lean.


🧱 Challenges

The "< 1 second" guarantee

Getting CRITICAL-tier auto-removal to fire before a post was visible to the community required careful understanding of Devvit's trigger execution order and Redis write patterns. Early builds had race conditions where the removal would fire after the post had already been indexed. Solving this involved restructuring the classifier to operate synchronously on the trigger path, with async analytics written off to the side.

Building a UI inside Reddit's sandbox

Devvit's Webview environment has meaningful constraints — no external scripts, limited CSS features, no direct DOM access to the parent page. Building a production-grade dashboard that felt native to Reddit required building everything from scratch: tabs, modals, toast notifications, data tables, form validation — all in vanilla HTML/CSS/JS inside a single index.html.

False positive UX

A moderation tool that removes too much is almost as harmful as one that removes too little. I added a first-class false positive tracking flow — when a mod overrides a Sentinel decision, it's logged to the audit trail and feeds back into accuracy reporting. The community health score penalizes high false positive rates, keeping the system honest.

Proving ROI

Mods needed to be able to show that Sentinel was working — to themselves, to their communities, and to Reddit. Building the analytics layer (estimated hours saved, auto-resolution rate, false positive rate, health score trajectory) took nearly as long as the classifier itself. But it's what transforms Sentinel from "another moderation script" into a tool mods will actually advocate for.


🏁 What's Next

Reddit Sentinel AI is a complete, deployable Devvit application. Future directions include:

  • Community health benchmarking — compare a subreddit's health score against similar communities
  • Adaptive signal weights — let the rule engine tune $w_{\text{content}}$, $w_{\text{author}}$, $w_{\text{behavioral}}$ per subreddit based on historical mod overrides
  • Cross-subreddit ban coordination — opt-in network for sharing watchlist entries between related communities
  • Mod workload balancing — assign queue items to specific mods based on availability and specialization

🙏 Final Thought

Reddit's communities are built and maintained by people who ask for nothing in return. The least we can do is give them tools worthy of the work they do.

Reddit Sentinel AI — Because mods deserve better tools.

Built With

Share this project:

Updates