Inspiration
Reddit's volunteer moderators donate an estimated $3.4 million/year in unpaid labor — and get paid back in harassment. Hostile users send insults, threats, and abuse directly through modmail. Right now, every mod reads each one personally, absorbing the toxicity just to decide how to respond.
We wanted to build something that puts a layer of protection between the mod and the worst messages — not to automate moderation, but to make the human job less painful.
What it does
ModMail Copilot reads each incoming modmail conversation and instantly posts a private mod-only note containing:
- Classification — ban appeal, rule question, harassment, spam, and more
- Severity flag — 🟢 low / 🟡 med / 🔴 high
- Mod Shield — if the message is abusive, the mod sees a calm neutral summary instead of the raw toxic text, so they don't have to absorb insults just to triage
- Draft reply — written in the user's own language (Spanish in → Spanish out)
- Confidence indicator — so mods know when to copy-paste vs. rewrite from scratch
The human is always in the loop. The app never sends replies automatically.
Every note is isInternal: true — invisible to the user, visible only to the mod team.
How we built it
Built on Devvit's web architecture (@devvit/web 0.12.24) as a Node.js HTTP server
in TypeScript. The onModMail trigger fires on every incoming message, runs through
a filter chain (skips auto-generated messages, internal mod discussions, and its own
replies to prevent infinite loops), fetches the full conversation via Reddit's modMail
API, and calls OpenAI GPT-4o-mini with a strict JSON-output prompt.
Results post as isInternal: true modmail notes. Redis handles idempotency (no
double-posting on Devvit retries), hourly rate limiting per subreddit, and daily
budget caps configurable by the mod team.
Challenges we ran into
The LLM provider problem. Our original design used Anthropic Claude. Two days before
the deadline we discovered that Reddit filed an active lawsuit against Anthropic in 2025
over unauthorized data scraping — making Devvit staff approval of api.anthropic.com
effectively impossible. We pivoted to OpenAI in under 2 hours. The architecture survived
because the prompt contract was provider-agnostic from day one.
The infinite loop. Every Devvit app gets a bot account registered as a subreddit
moderator. Its own internal notes re-fire onModMail. Solved by filtering on
messageAuthorType and context.appName before any processing.
Devvit's modmail menu limitation. We wanted 👍/👎 feedback buttons on each mod note.
Devvit's menu system only supports comment, post, and subreddit locations — no
modmail context. The feedback system is reserved for v2.
Accomplishments we're proud of
- Crisis path works correctly — tested live with self-harm language: warm Spanish response with 988 + findahelpline.com, zero punitive framing
- Mod Shield verified — abusive message → calm summary shown, raw insults filtered
- Zero silent failures — every error path (bad API key, rate limit, LLM error, conversation read failure) leaves a useful fallback note for the mod
- Multi-language confirmed — Spanish modmail in, fluent accented Spanish draft out
What we learned
Devvit's new @devvit/web architecture is a real Node.js HTTP server — not the
legacy Devvit.configure / Devvit.addTrigger pattern most docs still show.
Reading the actual node_modules type definitions was the only reliable source of
truth. External domain approval is a manual process with unpredictable queue times —
build provider-agnostic abstractions from day one.
What's next
- Per-subreddit learning (v2): store approved drafts in Redis, inject top examples as few-shot prompts so the app sounds more like each community over time
- Analytics dashboard: custom post showing modmail volume, severity trends, and draft acceptance rate
- Modmail-context menu items: 👍/👎 feedback per note once Devvit adds the API
Built With
- api
- devvit
- gpt-4o-mini
- node.js
- openai-api
- redis
- typescript
Log in or sign up for Devpost to join the conversation.