Inspiration
Reddit moderators are volunteers. They manage communities of thousands - sometimes millions - of members without pay, without tools built for scale, and without relief. The mod queue never sleeps. A subreddit with 50,000 members can receive hundreds of posts a day, and every one needs a human eye before it either harms the community or gets wrongly removed.
We built ModMind because the hardest part of moderation is not knowing the rules - it is applying them consistently at volume, across time zones, in multiple languages, without burning out the people who volunteered to keep communities healthy.
What it does
ModMind watches every post and comment the moment it is submitted. It reads the content, checks it against the subreddit's actual written rules, and posts a moderation suggestion before any human mod has to look at it.
Each suggestion includes:
- A confidence score (0-100)
- The specific rule violated
- A plain-English explanation of why
- A pre-written removal reply ready to send
- A language badge for non-English content
High-confidence violations are automatically held. Clean content passes through silently. Mods only see what actually needs their attention.
Beyond real-time evaluation, ModMind provides three layers of intelligence that compound over time:
Weekly Digest - Every Monday, ModMind posts a mod-only summary of the past week: how many posts were evaluated, which rules were violated most, language breakdown, AI accuracy rate, and an AI-written paragraph analyzing patterns the mod team should know about.
Rule Gap Detector - Every month, ModMind analyzes the history of mod overrides and identifies patterns suggesting missing or unclear rules. It drafts proposed rule additions and posts them for the mod team to review and vote on directly inside Reddit.
Multi-Language Support - ModMind supports 15 languages. Non-English posts are detected automatically, evaluated against translated versions of the subreddit's own rules, and replied to with a bilingual draft removal message in both English and the user's native language.
How we built it
ModMind is built entirely on Reddit's Devvit platform using TypeScript.
Evaluation pipeline: Every PostSubmit and CommentSubmit event fires a trigger in main.ts. The content is adapted into a typed ContentItem, deduplicated, and passed through language detection before reaching the AI evaluator. The evaluator builds a structured prompt from the subreddit's actual rules and calls OpenAI's GPT-4.1-mini via the Responses API. The response is parsed, normalized, and thresholded before any action is taken.
Storage: All state lives in Devvit's Redis. Per-user history is capped at 20 actions. Subreddit rules are cached with a 24-hour TTL. Translated rules are cached per language with a 7-day TTL. Weekly stats are maintained incrementally so the digest job never has to scan thousands of records.
LLM guardrails: We added several layers of safety around the AI:
- Prompt injection detection strips manipulation attempts from post content before it reaches the prompt
- Hallucination detection verifies every rule cited by the AI actually exists in the subreddit's rule set
- Faithfulness scoring checks whether the AI's explanation is grounded in the post's actual content
- Toxicity detection rewrites condescending draft replies with neutral fallbacks
- Per-user rate limiting prevents API quota exhaustion from rapid submissions
- Every AI failure falls back to a safe approve result - the app never wrongly removes content due to an API error
Testing: 38 unit tests and 12 integration tests cover the full pipeline including edge cases: the app-comment loop guard, confidence scale normalization, hallucination suppression, and the digest and rule gap flows. Coverage is 88%+ on AI and storage modules.
Challenges we ran into
The confidence scale bug was the most painful issue. The AI was returning confidence: 1 meaning 1% confident, but our normalizer was treating any value <= 1 as a decimal and multiplying by 100, turning a 1% confidence into 100%. Every clean post was being flagged and removed. Fixing it required tracing the value through three layers of code - the AI prompt, the JSON parser, and the normalizer - before we found where the multiplication was happening.
The suggestedAction override was a subtler bug. Our evaluator had a line that set shouldFlag: true if the AI returned suggestedAction: "hold", regardless of confidence. GPT-4.1-mini returns "hold" as a cautious default on borderline content, so perfectly clean posts were being held at 100% confidence. Removing that clause and adding an explicit default-to-approve instruction in the prompt fixed it.
The infinite comment loop happened early in testing. When ModMind flagged a post and posted a suggestion comment, CommentSubmit fired on that comment, triggering another evaluation, another comment, and so on - 33 times before we caught it. The fix was adding a loop guard that checks whether the comment author is the app itself before evaluating.
Settings not being read at runtime was a frustrating false lead. We changed flagThreshold in the Devvit settings UI and nothing changed. Turns out context.settings.getAll() was silently returning an empty object and falling back to defaults. Adding a runtime settings log showed us immediately what was being returned, and we fixed the silent failure.
Multi-language false positives occurred because the test subreddit had no rules initially. With nothing to evaluate against, the AI flagged everything including clean French and Spanish posts. Adding explicit subreddit rules and raising the flag threshold resolved it.
Accomplishments that we're proud of
The rule gap detector is genuinely novel. No moderation tool anywhere - on Devvit or anywhere else - analyzes mod override behavior to suggest new rules. Every time a mod overrides an AI suggestion, that decision is stored. Over 30 days, patterns emerge: the same kind of post getting overridden repeatedly means either the AI is wrong about a rule, or the rule does not exist yet. ModMind surfaces that gap and drafts a proposed rule for the mod team to review.
15 languages with bilingual replies. We did not just detect language - we translate the subreddit's actual rules into the user's language before evaluating, so the AI applies the rules correctly in context. The draft removal reply is generated in both English (for the mod) and the user's native language (to send to the user). One click.
The weekly digest is a real product. It is not a list of numbers - it is an AI-written analysis of the week's moderation patterns, with recommendations, a confidence trend chart, rule calibration alerts, and a mod team activity summary. We ran it live and it correctly identified that Rule 2 was driving 75% of violations and suggested the mod team review its clarity.
Production-quality code in a hackathon. 38 unit tests, 12 integration tests, 88%+ coverage, zero vulnerabilities, full documentation, proper error handling on every AI call, and a complete set of LLM observability metrics including hallucination detection, faithfulness scoring, and confidence calibration tracking.
What we learned
The hardest part of building an AI-powered moderation tool is not the AI - it is the edge cases around the AI. What happens when the API times out? What happens when the model returns malformed JSON? What happens when the confidence scale is wrong? What happens when the model cites a rule that does not exist?
Every one of those questions needed a specific, tested answer before we could trust the app to run unsupervised on a real subreddit. The fallback behavior - always approve when uncertain, never remove content silently, always leave a human in the loop - turned out to be the most important design decision we made.
We also learned that subreddit rules matter enormously for prompt quality. An AI evaluating content against vague rules will flag everything. An AI evaluating against specific, well-written rules is accurate and useful. ModMind is only as good as the rules it is given - which is part of why the rule gap detector exists.
What's next for ModMind
Interactive accept/override buttons directly in the suggestion comment, so mods can act without leaving the post.
A/B threshold testing - run two confidence thresholds simultaneously on a 50/50 split of incoming posts and report which one has better precision and recall after 100 evaluations. Let mods choose the threshold that fits their community.
Mod explanation quality feedback - mods can reply with a thumbs up or thumbs down to rate whether ModMind's explanation was useful. This feeds into a quality score tracked in the weekly digest.
Subreddit onboarding flow - a first-run experience that walks new mods through configuring thresholds, reviewing the rule set, and understanding what ModMind will and will not do automatically.
Reddit Developer Funds eligibility - once ModMind reaches engagement milestones across installed subreddits, it becomes eligible for Reddit's Developer Funds program, creating a sustainable path to keep the app maintained and improved.
Built With
- devvit
- eslint
- gpt-4.1-mini
- node.js
- openai-api
- reddit-developer-platform
- redis
- typescript
- vitest
Log in or sign up for Devpost to join the conversation.