Inspiration

Two peer-reviewed papers from Bajpai & Chandrasekharan (UMich) put hard numbers on a problem moderators have lived with for years:

  • 84% of moderators "sometimes, often, or almost always" leave the modqueue to seek additional context while reviewing reports — checking user history, scrolling parent threads, scanning prior mod logs (In the Queue, CHI 2026).
  • 74.5% of moderators report experiencing a collision — two mods unknowingly acting on the same item at the same time (Towards a Better Modqueue, arxiv 2409.16840).
  • The modqueue UI has not been redesigned since 2008.

Every time a mod leaves the queue to gather context, they lose seconds. Across a busy sub, that's hours per week of unpaid volunteer time. Huddle brings the context to the queue.

What it does

Huddle installs as a custom post in any subreddit you moderate. Open it and you see a smarter, team-aware version of the modqueue:

  • Automatic grouping — reports cluster by author. Multiple reports against one user collapse into one row with bulk Approve all / Remove all actions.
  • Factual AI summaries — every item shows a one-sentence summary like "This 259-day-old account has made 2 comments in this subreddit, both within the last 7 days." The LLM never sees the reported content — only a structured fact dict (account age, posts/comments in sub, last 7 days, prior removals). Summaries are cached forever per item.

Why an AI that can't see content? Every other AI mod tool risks hallucinating about your community's content. Huddle's LLM only sees a 5-field fact dict — it cannot make claims it can't justify. Mods see the exact same dict next to the AI's sentence in the Context Peek drawer, so every summary is auditable.

  • AI-suggested mod action — a color-coded chip above each item recommends Approve / Remove / Spam with a confidence level and a single factual sentence citing only the dict. When Gemini's free-tier rate-limits (429), a deterministic heuristic over the same facts takes over so the chip stays useful.
  • Context Peek drawer — click any item, a right-side drawer slides in showing (1) the exact facts the AI received as a small table, (2) the user's last 5 post/comment titles in this sub with two-letter status codes (ok / rm / pn / sp), and (3) a 30-day mod-action stacked histogram by day and action type. From the drawer header a mod can Open on Reddit, View profile, or Ban user (click-twice to confirm). Banning auto-removes every queued item the user authored in one sweep.
  • Bulk + inline + reject-with-reason actions — Approve / Remove individual items, Approve all / Remove all at the group level, plus Reject with reason: an inline panel where the mod types a removal reason (or clicks "Suggest with AI" to have Gemini draft a friendly, factual, second-person reason). On confirm Huddle removes the item and posts the reason as a distinguished, stickied reply — same UX Reddit's native removal-reason flow uses.

How we built it

  • Devvit Web (@devvit/web/server) — Hono HTTP server hosting five trigger webhooks (PostReport, CommentReport, ModAction, PostSubmit, CommentSubmit) plus the queue/summary/drawer/action APIs.
  • Redis for the entire data plane — ZSETs for group sets and per-user history (recent titles capped at 20, 30-day action timeline), HASH for the user-facts cache (24h TTL), STRING for cached AI summaries (forever). Devvit Redis has no plain SET ops, so ZSETs with timestamp scores throughout.
  • React 19 + Tailwind 4 + Vite for the WebView — the queue polls /api/init every 5 seconds; each item lazy-fetches its AI summary; the drawer composes over data already in Redis (no extra LLM call when the drawer opens).
  • Google Gemini Flash (gemini-flash-latest) via Google AI Studio's free tier — 1,500 requests/day, no card. The system prompt locks the model into one neutral sentence ≤25 words. Thinking mode disabled so we get a clean answer in parts[0]. Every failure path (no key, network error, 429, safety block, response under 10 chars) silently falls back to a templated raw-facts sentence over the same fact dict.
  • TypeScript end-to-end with branded t1_ / t3_ / t2_ thing-id types narrowed via runtime predicates rather than casts.

A reverse-lookup huddle:item-groups:{itemId} ZSET lets mod-action cleanup remove an actioned item from every group it belongs to in a single pass.

Challenges we ran into

  1. The Devvit ecosystem migrated from Blocks (@devvit/public-api) to Web (@devvit/web) between the spec and the build. Every import pattern in the original spec was obsolete. We translated the architecture in place: triggers become HTTP routes, settings move from Devvit.addSettings to devvit.json settings.global, custom posts host React WebViews via splash.html/game.html entrypoints.
  2. Devvit Redis has no plain SET operations — no sAdd / sRem / sMembers. We used ZSETs throughout, with timestamps as scores so we get free recency ordering on every read.
  3. Gemini 2.5's thinking mode broke our parser. The model's response prepends a "thought" part with a one-character placeholder (*) before the real answer. We cached that single * as the summary on first call. Fix: thinkingConfig.thinkingBudget: 0, concatenate all non-thought parts, reject any response under 10 characters, and auto-bust suspiciously-short cached values on read.
  4. Reddit doesn't expose reporter identity for anonymous user reports — by policy. We had to design around this: the group is keyed by reported-content author (not reporter), and mod-initiated reports surface separately through Post.modReportReasons / Comment.modReportReasons.
  5. Free-tier Gemini 429s under burst load. We added a Redis circuit breaker (huddle:gemini:blocked-until) with a 5-minute cooldown that flips the UI to the deterministic heuristic until Gemini recovers.

Accomplishments that we're proud of

  • The AI never sees content. This was a non-negotiable design constraint — it eliminates the entire class of hallucination and judgment errors that plague AI moderation tools. Every summary is grounded in deterministic facts; the LLM is just a sentence formatter.
  • The fallback path is invisible to users. Whether Gemini returns prose or the raw-facts formatter takes over, the moderator sees the same shape of information in the same place — the LLM is genuine polish, not a load-bearing dependency.
  • The Context Peek drawer is AI-free by design. It composes facts + history that are already in Redis. No extra LLM call. The "AI legibility" promise is literal: mods see the exact dict the AI received.
  • Three independent AI surfaces, all sharing the same no-content guarantee. Summary, suggested-action, and removal-reason are three separate Gemini prompts, each tuned for its purpose. Each one has its own deterministic fallback so the UI never breaks.
  • Ban closes the user-level loop too. Banning from the drawer cascade-removes every queued item the user authored in one server-side sweep — a Bajpai finding (mods still leave the queue to take user actions) addressed inside the queue.
  • Backed by peer-reviewed research. Every feature maps to a finding in the Bajpai papers.

What we learned

  • The most important architectural decision in an AI moderation tool isn't the model — it's deciding what the model doesn't get to see. Constraining Gemini to formatting-only made hallucination structurally impossible.
  • Devvit's WebView model is more permissive than it looks once you accept that custom posts ARE the UI surface — no separate "splash + game expanded view" Blocks dance, just two HTML entrypoints sharing state through HTTP.
  • Diagnostic affordances pay back fast. Surfacing the LLM source as a chip in the UI ("AI" vs "raw") and logging each fetch outcome to playtest CLI turned a long debug cycle into a single-glance check.

What's next for Huddle

  • Live mod presence — avatar dots on each row so two mods don't act on the same item simultaneously (the 74.5% collision stat).
  • Persistent realtime updates instead of 5-second polling, via Devvit's realtime channel.
  • Toolbox usernote integration so existing notes surface in the Context Peek drawer.
  • In-drawer mute / approve-user actions to complete the user-level loop.
  • Reporter reliability scoring once Huddle has accumulated enough per-user history.

These were deliberately scoped out of the MVP per the 4-day budget. The whole MVP went to the headline 84%-fix.

Built With

Share this project:

Updates