Inspiration

Every active subreddit mod team shares the same problem: the queue never stops. Reports pile up faster than humans can review them, enforcement is inconsistent across mod shifts, and there's no institutional memory — a new mod makes a different call than a veteran would on the same content. We've seen subreddits go dark, fragment, or lose their culture entirely because mod burnout set in.

The existing tooling — Automod, third-party bots, manual review — forces mods to choose between speed and accuracy. Automod is fast but dumb; manual review is accurate but slow and doesn't scale. We wanted to build the thing in between: a system that reasons the way an experienced moderator would, explains its thinking out loud, and only acts autonomously when it's genuinely certain.

What it does

ModQueue Triage AI intercepts every reported post and comment in a subreddit and runs it through an LLM-powered classifier tuned to that subreddit's specific rules.

For each reported item the app produces:

The rule it believes was violated, matched against the subreddit's actual rule list A confidence score (0–100%) calibrated to three tiers: auto-remove (≥ 95%), flag for review (70–94%), or escalate to the human queue (< 70%) A plain-English explanation the mod can read and agree or disagree with The exact excerpt from the content that triggered the verdict Mods interact through a pinnable Triage Dashboard — a custom post that shows the full AI reasoning for every pending item and lets them approve, remove, or escalate with one click. Every mod override is recorded. The dashboard shows real-time metrics: items processed, auto-removes, AI accuracy (agreement rate), and time saved.

Critically: the app ships in shadow mode by default. For the first week it classifies everything and logs every decision, but takes no real actions. Mods review the shadow data, tune the thresholds, and only enable enforcement when they're satisfied it's accurate on their specific community.

How we built it

The architecture is split across three concerns that were built and validated independently before being wired together:

Group A — Event Intake & Rule Context Devvit PostReport and CommentReport triggers capture incoming events. Because Devvit trigger handlers must return fast to avoid platform timeouts, they do nothing except enqueue a classify-content scheduler job carrying the content payload. The actual work happens out-of-band. The rule loader pulls the subreddit's rule list via reddit.getRules() and optionally reads a mod_triage_context wiki page where mods can add free-form guidance (common spam patterns, local terminology, known ban-evaders).

Group B — AI Decision-Making The classifier sends a structured prompt to GPT-4o-mini via fetch (Devvit's HTTP capability) with response_format: { type: 'json_object' } enforced, so the model is constrained to return a parseable verdict every time. The response is sanitised before use — confidence clamped to [0,1], enum fields validated against an allowlist — so a hallucinated response can't crash the pipeline. The policy engine then maps the classification to a concrete action based on the mod-configured thresholds.

Group C — Mod Interface & Observability The Triage Dashboard is a Devvit custom post rendered with Blocks (the JSX-based UI kit). It uses useState for local page state, reads pending records from a Redis sorted set (ordered by reportedAt), and provides paginated item cards with one-click action buttons that call reddit.remove() / reddit.approve() and write the resolution back to Redis. Daily metrics are rolled up in Redis and optionally delivered as a modmail digest.

Challenges we ran into

Trigger timeout budget. Devvit gives triggers a very tight execution window. Our initial prototype did the full classify-and-act cycle inline, which failed immediately. Moving to the scheduler job pattern (trigger queues a job; job does everything else) solved this but required careful data serialisation through the job payload.

Prompt reliability across wildly different subreddits. A rule called "Be Civil" in r/politics means something completely different from "Be Civil" in r/relationship_advice. Early versions of the prompt were confidently wrong on edge cases. Adding the wiki context injection and lowering temperature to 0.1 significantly tightened this. Shadow mode data from a test subreddit was invaluable for catching systematic failures before they touched real content.

Mod trust as a design constraint. Every UI decision had to be filtered through "would a skeptical moderator trust this?" The temptation was to hide uncertainty and show clean verdicts. We did the opposite — the confidence bar, the excerpt, the full reasoning are all always visible. The "Why this verdict?" section is collapsed by default but always one tap away.

No built-in AI in current Devvit public API. The idea doc referenced "Devvit's AI integration" but the current public API doesn't expose a first-party LLM client. We routed through the http capability to OpenAI instead, with the API key stored as an encrypted app-level secret (never visible to subreddit mods, only settable by the app developer via devvit settings set).

Accomplishments that we're proud of

Shadow mode is the default. Most automation tools ship with actions on. We ship with actions off, which is the right call for trust. The first-week shadow data becomes the evidence mods use to decide whether to enable enforcement. Every single decision is explained. No verdict is rendered without a rule name, a reasoning paragraph, and an excerpt. There is no "black box" path through the system. The trigger handler is 12 lines. It does one thing — enqueues a job — and returns. This keeps the platform happy and means the AI pipeline can take as long as it needs without risking data loss. Mod overrides close the feedback loop. When a mod disagrees with the AI, that override is counted in the accuracy metric displayed in the dashboard. This makes the AI's failure modes visible rather than hiding them.

What we learned

Moderation AI is less of a classification problem and more of a reasoning communication problem. The hardest part wasn't getting the model to produce the right verdict — it was getting it to explain that verdict in language a moderator would recognise and trust. Confidence calibration matters far more than raw accuracy: a model that says "I'm 60% sure" and is right 60% of the time is far more useful for this use case than one that says "I'm 99% sure" and is right 85% of the time.

We also learned that Devvit's scheduler is the right primitive for any AI-backed app. Triggers are for intake; jobs are for work. This separation is non-negotiable if you want production reliability.

What's next for ModQueue Triage AI

Per-rule confidence calibration. Right now there's one auto-remove threshold for all rules. The obvious next step is per-rule overrides — a community with zero tolerance for slurs can set that rule to auto-remove at 80%, while a more subjective "be respectful" rule stays at 99%.

Override learning. Right now mod overrides are counted but not fed back. The next step is storing override examples in the wiki context page so the AI learns from the mod team's corrections over time — turning the dashboard into a fine-tuning loop without any infrastructure beyond what Devvit already provides.

Modmail triage. The architecture already handles ModMail triggers (the ingestion layer is built for it). Adding modmail classification would let the same confidence/threshold system handle message-based reports and appeals.

Submission metrics from beta. We are actively deploying to test subreddits this week. Real numbers — time-to-decision, queue clearance rate, AI accuracy on live content — will be added to this submission as they come in. Expect an update before the deadline.

Built With

  • appeal
  • daily-digest-jobs)-webview:-vanilla-html/css/js-with-inter-font
  • dark
  • platform:-reddit-devvit-0.12.22-(typescript-+-jsx-blocks)-ai:-openai-api-(classification-+-appeal-re-evaluation)-storage:-devvit-redis-(sorted-sets
  • premium
  • ttl-based-expiry)-scheduling:-devvit-scheduler-(classify
Share this project:

Updates