🛡️ ZENmode AI — Project Story

Inspiration

Every day, Reddit moderators fight an invisible war.

They wake up to hundreds of reports, spam floods, coordinated raids, toxic comment chains, and scam links — all while volunteering their time for free. There are 2.8 million subreddit moderators on Reddit, and most of them are burning out.

I've personally seen subreddit communities collapse — not because the topic became irrelevant, but because moderation couldn't keep up. A single raid can destroy years of community culture in under 10 minutes. A spam wave can bury genuine discussions under garbage. Toxic comment chains can drive good-faith users away permanently.

The existing tools were built for a different era. AutoModerator is powerful but requires you to think like a programmer. The report queue is a flat, unorganized list. There's no AI layer, no priority system, no predictive protection.

The question that drove this project: What if moderation could be as intelligent as the threats it faces?

That question became ZENmode AI — an autonomous, AI-powered moderation suite built natively on Devvit.


What We Built

ZENmode AI is a comprehensive Reddit moderation platform with 27 integrated tools, organized into 4 core pillars:

Pillar 1 — 🚨 Threat Detection

Real-time, automated defense layer that operates 24/7 without human input.

Module Function
Spam Detector Hash-matching + behavioral frequency analysis
Toxic Comment Filter BERT-based NLP hostility detection
Scam/Phishing Detector URL threat intelligence + OCR image scanning
Bot/Fake Account Detector Behavioral biometrics + account metadata analysis
Raid Attack Detector Traffic velocity monitoring + emergency lockdown
Cross-Subreddit Threat Detector Platform-wide bad actor tracking
Hate Speech Escalation Tracker Long-term radicalization pattern monitoring

Pillar 2 — 🧠 AI Intelligence Layer

Dynamic, context-aware AI that goes beyond rigid keyword filters.

Module Function
Context-Aware Moderation AI Semantic understanding of sarcasm, slang, intent
AI Moderator Assistant GPT-4 powered recommendations for ambiguous cases
AI Summary for Long Toxic Threads LLM-based conflict narrative summarization
AI Ban/Warn Suggestion System Multi-variable penalty recommendation engine
AI Debate Mediator Real-time de-escalation interventions
Sentiment-Based Moderation Emotional tone analysis for mental health communities

Pillar 3 — ⚙️ Automation & Workflow

Rule-based automation that handles the repetitive 80% so mods can focus on the nuanced 20%.

Module Function
Auto Rule Checker Structural post format enforcement
Auto Moderation Bot If/Then programmable rule engine
Auto Flair Assigner NLP-based content categorization
Auto Archive Manager Time-based thread lifecycle management
Auto FAQ Responder Instant answers to repetitive questions
Queue Manager Priority-sorted moderation work center
Comment Cleanup Tool Bulk action macro for catastrophic threads
Rule Violation Auto Tagging Color-coded violation labeling

Pillar 4 — 📊 Analytics & Governance

Data-driven insights and transparent community governance tools.

Module Function
Community Health Dashboard Bird's-eye community health scoring
Toxicity Heatmap When/where toxicity clusters occur
Moderator Workload Tracker Burnout prevention + fair task distribution
Engagement Trend Analyzer Growth metrics and content strategy insights
Discussion Bots Automated community engagement events
Governance Tools Voting, mod-log ledgers, permission management

How We Built It

Architecture Overview

ZENmode AI uses a 3-layer architecture:

┌─────────────────────────────────────────┐
│         Reddit Platform (Devvit)        │
│   Blocks UI + Triggers + KV Store       │
└──────────────────┬──────────────────────┘
                   │ HTTP API calls
┌──────────────────▼──────────────────────┐
│         FastAPI Backend (Python)        │
│   ML Models + Rule Engine + Redis       │
└──────────────────┬──────────────────────┘
                   │
┌──────────────────▼──────────────────────┐
│         PostgreSQL + Redis              │
│   Persistent storage + Real-time cache  │
└─────────────────────────────────────────┘

Layer 1 — Devvit App (TypeScript)

The Reddit-native layer handles all platform integration:

  • Custom Post Type renders the ZENmode dashboard directly inside Reddit using Devvit Blocks UI
  • PostSubmit + CommentSubmit Triggers intercept every new piece of content in real-time
  • KV Store persists moderation statistics across sessions
  • Devvit Scheduler runs Auto Archive Manager and Discussion Bots on time-based triggers
  • Menu Actions allow moderators to launch ZENmode with a single click from the subreddit menu

Layer 2 — FastAPI Backend (Python)

The intelligence engine powers all AI and ML features:

ML Stack:

  • Hugging Face Transformers — BERT for toxic comment classification
  • scikit-learn — Spam detection, bot probability scoring
  • spaCy — Named entity recognition, URL analysis
  • VADER + TextBlob — Sentiment analysis
  • OpenAI GPT-4 API — AI Moderator Assistant, thread summarization, debate mediation

API Routes:

POST /api/moderation/check-post      → Spam + Rule check
POST /api/analysis/check-comment     → Toxicity + Scam check
GET  /api/moderation/queue           → Fetch prioritized queue
POST /api/moderation/approve         → Approve content
POST /api/moderation/remove          → Remove content
POST /api/suggestions/ban-warn       → AI penalty recommendation
GET  /api/analytics/health           → Community health data

Background Workers (Celery):

  • queue_worker.py — Processes moderation tasks asynchronously
  • raid_detector.py — Monitors traffic spikes in real-time via Redis streams

Layer 3 — Database

PostgreSQL stores all persistent data:

  • Moderation actions audit log
  • User warning history
  • Community health metrics
  • Governance votes and mod logs

Redis handles real-time operations:

  • Sub-millisecond queue priority sorting
  • Pub/Sub for real-time moderator notifications
  • Rate limiting for API calls
  • Session management

The Math Behind Priority Scoring

The Queue Manager uses a weighted priority score to rank items:

$$ P_{score} = w_1 \cdot R_{count} + w_2 \cdot S_{severity} + w_3 \cdot V_{velocity} + w_4 \cdot A_{age} $$

Where:

  • $R_{count}$ = number of user reports
  • $S_{severity}$ = violation severity score $(0-1)$
  • $V_{velocity}$ = rate of incoming reports per minute
  • $A_{age}$ = inverse of content age (newer = higher weight)
  • $w_1, w_2, w_3, w_4$ = tunable weight coefficients

Priority thresholds:

$$ \text{Priority} = \begin{cases} \text{HIGH} & \text{if } P_{score} \geq 0.75 \ \text{MEDIUM} & \text{if } 0.40 \leq P_{score} < 0.75 \ \text{LOW} & \text{if } P_{score} < 0.40 \end{cases} $$


The Math Behind Ban/Warn Suggestion

The AI Ban/Warn system computes a User Risk Score:

$$ R_{user} = \alpha \cdot H_{violations} + \beta \cdot \left(1 - \frac{T_{account}}{T_{max}}\right) + \gamma \cdot \frac{N_{negative}}{N_{total}} - \delta \cdot C_{positive} $$

Where:

  • $H_{violations}$ = historical violation count (normalized)
  • $T_{account}$ = account age in days
  • $T_{max}$ = maximum considered account age (730 days)
  • $N_{negative}$ = total negative interactions
  • $N_{total}$ = total interactions
  • $C_{positive}$ = positive contribution score
  • $\alpha, \beta, \gamma, \delta$ = weight parameters

Penalty mapping:

$$ \text{Action} = \begin{cases} \text{Permanent Ban} & \text{if } R_{user} \geq 0.85 \ \text{Temporary Ban (7d)} & \text{if } 0.65 \leq R_{user} < 0.85 \ \text{Mute (24h)} & \text{if } 0.40 \leq R_{user} < 0.65 \ \text{Warning} & \text{if } R_{user} < 0.40 \end{cases} $$


Challenges We Faced

1. Devvit's Blocks UI Constraints

Devvit Blocks is not React — it's a declarative UI system with strict layout rules. No CSS, no onClick props drilling, limited state management. We had to rethink our entire UI architecture, moving all state to main.tsx and passing it down cleanly.

Solution: Centralized state in the entry point, pure display components in blocks.

2. Real-time Updates Inside Reddit

Devvit apps don't have WebSocket support natively. Getting the queue to feel "live" required creative use of useAsync combined with KV Store polling.

Solution: Optimistic UI updates — update local state immediately on action, sync with backend asynchronously.

3. Running ML Models at Scale

BERT inference is expensive. Running a full transformer model on every single comment submitted to a large subreddit would create massive latency.

Solution: Two-tier filtering pipeline:

  1. Fast tier — Regex + keyword heuristics (sub-millisecond, catches 70% of cases)
  2. Slow tier — BERT inference only for borderline cases flagged by fast tier

This reduces GPU compute by approximately $\approx 80\%$ while maintaining accuracy.

$$ \text{Compute Saved} \approx 1 - \frac{N_{borderline}}{N_{total}} \approx 0.80 $$

4. Context-Aware Moderation — The Sarcasm Problem

Basic toxicity models flag benign sentences like "You absolutely killed that performance!" Training a model to understand context required fine-tuning on Reddit-specific conversational data.

Solution: Fine-tuned RoBERTa on a Reddit-scraped dataset with human-labeled context annotations. Added surrounding comment thread as input context window.

5. Coordinated Raid Detection — False Positives

Legitimate viral posts also cause traffic spikes. Distinguishing a genuine viral moment from a coordinated raid required more than just velocity monitoring.

Solution: Multi-signal detection combining:

  • Account age distribution of new posters
  • Subreddit membership duration
  • Cross-subreddit origin tracking
  • Semantic similarity of incoming content

$$ \text{Raid Score} = \frac{\sum_{i=1}^{n} \mathbb{1}[\text{age}i < 7\text{days}]}{n} \cdot V{velocity} \cdot S_{semantic_similarity} $$


What We Learned

  1. Moderation is deeply human — AI can handle 80% of cases, but the 20% edge cases require human judgment. The best tool augments humans, not replaces them.

  2. Devvit is genuinely powerful — Building natively inside Reddit means zero friction for moderators. No external login, no separate dashboard, no context switching.

  3. Speed matters more than perfection — A fast, 90%-accurate filter that responds in 50ms protects the community better than a perfect model that takes 2 seconds.

  4. Moderator burnout is a real crisis — Building the Workload Tracker revealed how unevenly distributed moderation labor is. Tools that protect moderators are as important as tools that protect communities.

  5. Privacy-first design is non-negotiable — All user data is anonymized before ML processing. No personally identifiable information is stored in the analytics pipeline.


What's Next

  • [ ] Mobile-optimized Blocks UI for on-the-go moderation
  • [ ] Multi-subreddit dashboard for large mod teams managing multiple communities
  • [ ] Federated threat intelligence sharing between opt-in subreddits
  • [ ] Custom ML model fine-tuning per subreddit (community-specific language patterns)
  • [ ] Integration with Reddit Developer Funds for sustained development

Built With

Devvit TypeScript Python FastAPI PostgreSQL Redis Hugging Face Transformers BERT RoBERTa scikit-learn spaCy OpenAI GPT-4 Celery Docker React Tailwind CSS


Built with ❤️ for the Reddit moderation community — the unsung heroes of the internet.

Built With

Share this project:

Updates