AI Slop Detector

Inspiration

I moderate a small subreddit. Nothing massive just a few thousand people who genuinely care about the topic. Six months ago, something shifted. Posts started appearing that looked right on the surface, correct vocabulary, proper grammar, on-topic, but felt hollow. No personality. No lived experience. Just words arranged to simulate a contribution.

I'd read a comment and think: a real person didn't write this. But I couldn't prove it. I couldn't act on a feeling.

So I'd move on. The AI slop would stay. Real members stopped engaging because their thoughtful replies got buried under a flood of generated content. The community slowly eroded and I, like thousands of moderators across Reddit, had absolutely no tools to fight back.

That helplessness is what inspired this app. I wanted to build the tool I wish I had.

What it does

AI Slop Detector automatically scores every new post and comment 0–100 for AI-generated content across three signal groups:

Text Analysis (45%) - 63 English AI phrases, 30+ multilingual phrases (Spanish, French, German), lexical diversity scoring via Type-Token Ratio, uniform sentence length detection, and obfuscation/leet-speak evasion detection
Account Signals (30%) - account age, karma level, and karma distribution anomalies (all post karma, zero comment karma = never had a real conversation)
Behavior Patterns (25%) - posting velocity, cross-subreddit scatter, duplicate title reposts, self-promotion domain flooding

Scores map to four risk tiers - Clean, Suspicious, Likely AI, and High Confidence AI each with a configurable mod action (queue, report, or remove).

Every flagged post gets a distinguished stickied mod comment showing the exact signals that fired, so mods always know why something was flagged - no black boxes, no unexplained removals.

Mods also get three menu items on every post AND comment:

🔍 Scan for AI Slop - manual on-demand scan
🤖 View AI Slop Score - retrieve cached result instantly
✅ Mark as Human - one-click false positive dismissal that removes the bot comment and approves the content ## How we built it Built entirely on Devvit using TypeScript with Reddit's native APIs and Redis for state management.

The scoring engine has three independent modules textSignals.ts, accountSignals.ts, and behaviorSignals.ts, that run in parallel via Promise.all() and feed into a weighted aggregator in scorer.ts. Scores are normalized to a common scale before combination so no single signal type can dominate.

Redis handles four responsibilities: deduplication (prevents double-scoring the same content), score history (7-day TTL for the View Score menu), weekly stats (aggregated per ISO week), and rate limiting (sliding 60-second window, max 30 scans).

The Devvit scheduler powers the weekly Monday digest. The AppInstall trigger sends a welcome modmail to orient mod teams on day one. All settings use SettingScope.Installation so each subreddit gets its own independent configuration.

Challenges we ran into

False positive calibration was the hardest problem. Academic writing, formal non-native English, and AI slop can look identical at the text level. The solution was layered: established accounts (180+ days, 500+ karma) get a 40% score reduction, and almost every signal requires stacking with others before triggering action.

Reddit's anti-spam fighting our own bot - Reddit's automated systems kept removing the bot's distinguished mod comments seconds after posting, even from a mod account. Took significant debugging to discover the fix: immediately approve and ignore reports on the bot's own comments after posting.

Devvit API quirks - comment.distinguish(true) throws for nested comment replies (sticky only works on top-level post comments). The Redis expiration parameter requires a Date object, not a millisecond number. The addModNote label only accepts specific enum values. Each of these cost hours of debugging with cryptic error messages.

Scoring calibration across languages - German phrases contributed enough signal but not enough to cross the SUSPICIOUS threshold alone, because German sentence structures produce more length variance than French or English, preventing the uniform-sentence bonus from stacking.

Accomplishments that we're proud of

A detection system that correctly scores obvious AI spam WITHOUT flagging formal human writing in testing, the false positive rate on legitimate content is near zero
Full multilingual detection across English, Spanish, French, and German because AI spam isn't English-only
A complete mod workflow: auto-detection → stickied evidence → configurable action → one-click dismissal. Most moderation tools only cover part of this loop
Distinguished mod comments that survive Reddit's own anti-spam system by self-approving and ignoring reports
Deduplication on re-scan, rescanning a post replaces the existing bot comment instead of stacking duplicates, making the app behave like a professional tool rather than a prototype
A weekly digest that gives mod teams actionable data every Monday without any manual effort ## What we learned The most important lesson: moderation tools need exit ramps as much as entry ramps. Every flagging system produces false positives. Building the "Mark as Human" dismissal path, one click that removes the bot comment, approves the content, and clears the cached score, was as important as building the detection itself. A mod tool that can't be overridden is a liability, not an asset.

I also learned that Reddit's community trust signals (account age, karma, posting history) are genuinely powerful features for spam detection. A 2-year-old account writing formally is a real person. A 2-day-old account with 1 karma writing the same text is almost certainly not. The account and behavior layers are what separate AI Slop Detector from a simple phrase-matching script.

Finally: always approve your own bot's comments immediately after posting. Reddit will remove them otherwise.

What's next for AI Slop Detector

Image OCR detection - AI-generated text embedded in images is the next evasion vector. Adding OCR to extract and score text from image posts would close this gap.
Portuguese and Japanese phrase lists - two of Reddit's largest non-English user bases, both increasingly targeted by AI spam campaigns
Moderator feedback loop - when a mod dismisses a false positive or confirms a true positive via "Mark as Human", that signal could be used to locally calibrate thresholds for that subreddit over time
Cross-community threat intelligence - if the same new account is flagged across multiple subreddits running the app, that coordinated signal could significantly increase detection confidence
Reddit Developer Funds - if the app reaches engagement milestones post-hackathon, the plan is to maintain and expand it long-term. AI spam isn't going away. Neither should the tools to fight it.

Built With

devvitscheduler
reddit
redis
typescript

Updates

Kirankumar Vasala started this project — May 26, 2026 10:44 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.