Inspiration

In a 2025 study, Cornell researchers found AI-detection tools for Reddit content range from 0–100% accuracy with false-positive rates up to 32%. One mod in the study summed it up: "There has to be a lot that we're missing."

The detection arms race is mathematically un-winnable. Meanwhile, mods are drowning. Communities like r/AmItheAsshole estimate nearly half of all submissions are now AI-written or AI-assisted. Reddit removed 40 million pieces of spam and manipulated content in the first half of 2025 alone.

Every other team in this hackathon is building a better filter. We took a different bet.

Threshold moves moderation left. Instead of catching bad posts after they hit the modqueue, we coach users at the moment of submission, before the post buries itself in a sub it doesn't belong in. Where Reddit's native Post Guidance catches keywords, Threshold understands meaning.

What it does

Threshold Coach is the first module of Threshold, pre-submission community infrastructure for Reddit.

When a user submits a post, Threshold reads the draft against the sub's semantic rules. If the post looks fine, nothing happens. The modqueue stays quiet. If it looks like a rule violation, Threshold posts a stickied, mod-distinguished comment under the post within seconds: which rule it likely conflicts with, the exact phrase that stood out, a concrete fix in the user's tone, and a clean "submit as-is, your call" out.

Three things make this different from every other mod tool:

  1. It reads meaning, not strings. Post Guidance catches the literal phrase "check out my SaaS." Threshold catches "I wrote up the full breakdown of what actually worked for me, happy to share if anyone's interested. DMs open" — same intent in different clothes. We tested this. It works.

  2. The Sub Compact. A free-text field where mods describe what their community actually wants: the vibe, the taste, the thing veteran mods know but new users never figure out. Threshold uses this as context on every coach call. This is institutional knowledge captured for the first time as a moderation primitive.

  3. Authority stays with the mod. Threshold informs, it doesn't enforce. Every coach comment ends with "Or submit as-is, your call." Mods still see the post in their queue. The coach is a guide, not a gatekeeper. This is why mods trust it.

How we built it

  • Devvit Web template (Mod Tool starter), Hono server, TypeScript, Reddit-native runtime.
  • Gemini 2.5 Flash as the reasoning engine. BYOK (Bring Your Own Key) so each mod brings their own free Gemini key, keeping costs with the community using it.
  • Redis (Devvit-provided) for settings, per-rule hit counters, recent flags, and coach run logs.
  • Structured JSON output with a bulletproof parser. Three fallback strategies, defaults to "clean" on failure. Silent bypass that never blocks a user's post.
  • Preset rule packs for AITA-style, Hobby, Professional, and Support communities. Mods get 4 working rules in one dropdown click.
  • Mod dashboard with live counters (posts coached, flagged, cleared), per-rule hit counts, recent flags (titles only, no usernames, following Reddit mod tool privacy norms), and status row.

Challenges we ran into

  1. Anthropic's API is not on Devvit's allowlist. We discovered mid-build that api.anthropic.com is explicitly blocked. Devvit only approves OpenAI and Google Gemini. We pivoted from Claude to Gemini in 30 minutes. The architecture was provider-agnostic by design. The model is an implementation detail, not the product.

  2. LLM JSON output is brittle. Gemini occasionally returned truncated or malformed JSON. We built a tolerant parser with three escalating fallback strategies and a final "default to clean" rule. False positives damage trust faster than missed flags.

  3. Devvit's form primitives are constrained. Some properties we expected (like lineCount on paragraph fields) aren't supported. We accepted the constraint and built the dashboard with what the platform gives us, instead of fighting the framework.

Accomplishments we're proud of

  • It actually works. We ran 3 adversarial test posts: stealth self-promo, AITA rage bait, and a clean control post. Threshold flagged the first two with correct rule citations and accurate phrase extraction. It let the third pass cleanly. Zero false positives in eval.
  • Voice B microcopy. Every word the user sees was hand-crafted. Knowledgeable peer, never preachy. "Hey, Threshold here. Want to flag something before this gets buried."
  • Sub Compact as a primitive. As far as we can tell, no other Reddit mod tool captures "what does this community actually want" as a first-class input. We think this becomes the standard.
  • Authority-respecting by default. Threshold is inform-only. It never blocks, removes, or punishes. The mod still owns the queue.

What we learned

  • Detection is the wrong frame. Cornell got it right: "Traditional moderation focused on removing bad content. The authenticity challenge requires verifying good content is genuinely human." Threshold is the seed of that shift.
  • The unit of moderation is shifting. From "the post" to "the relationship between participant and community." Subs will fragment along authenticity tiers. Mods become orchestrators of trust contracts, not gatekeepers of every comment. Threshold is the infrastructure for that future.
  • Less is the polish. Every feature we didn't ship, analytics charts, multi-LLM provider support, comment-time coaching, user-facing appeals, made the V1 sharper, not weaker.

What's next for Threshold

V1 (now): Threshold Coach, the pre-submission coach. Shipped.

V2 (June 2026): Threshold Provenance, authenticity tier attestation. Users tag their own posts (human, AI-grammar, AI-rewritten, AI-generated). Mods set per-sub policy. Subs become trust-bordered.

V3 (Summer 2026): Threshold Trust, per-sub reputation graph. High-rep users skip the coach. Trust becomes a moderation primitive.

V4 (Fall 2026): Threshold Compact, the full mod-defined trust contract layer. Each sub defines and enforces its own authenticity standard. Threshold is the substrate.

Threshold is pre-submission community infrastructure. V1 coaches at the moment of submission, the soonest hook Devvit exposes today. V2+ ships true pre-submit blocking as the platform matures.

The standard, before the post.

Built With

Share this project:

Updates