Project Story
Inspiration
Every moderation tool on Reddit fires at one moment, submission. AutoModerator, Post Guidance, Crowd Control, the Harassment Filter, they all inspect content before it goes live.
We kept coming back to one unsettling fact, straight from AutoModerator's own documentation: it "will not act on content already approved or removed by a moderator" and "cannot react to a user's edits." In other words, the instant a moderator approves a post, every tool on the platform stops watching it.
That's not a small gap. It's an attack surface.
The highest-leverage moment to inject a scam link is not at submission, when scrutiny is highest, it is after approval, once a post has climbed the feed and every eye is on it. A wholesome post gets approved, trends, and then the author quietly edits in a bit.ly scam link, an affiliate code, or an off-platform "DM me to buy." Mods discover it hours later, from user reports, after the damage is done.
The research backed up our hunch:
- Cornell's CSCW 2025 study calls AI-driven content a "very disruptive" triple threat
- The 2026 CHI modqueue study found moderators "juggle multiple interfaces and third-party tools"
- No existing tool covered the post-approval timeline
So we built one.
What it does
Tripwire is moderation’s rear-view mirror, the only tool that watches what happens after approval.
Capture , Snapshots exactly what was approved (title, body, links, domains, approving mod)
Watch , Detects edits and compares against the approved snapshot
Score , Computes a drift score across:
- Links
- Off-platform solicitation
- Obfuscation
- Structural changes
- Act , Based on threshold:
- Re-queue content
- Notify moderators
- Log silently
- Review , Drift Log dashboard with:
- Severity
- Signals triggered
- Author + approving mod
- One-click actions (View / Restore / Remove)
When a scam link is injected post-approval, Tripwire catches it in seconds, automatically and explainably.
How we built it
Tripwire is a Devvit app (TypeScript) built on @devvit/public-api.
Core Components
Triggers
ModAction, approvalsPostUpdate/CommentUpdate, editsAppInstall, onboarding
Storage (Redis)
- Approval snapshots
- Watchlist (sorted set)
- Daily pruning via
zRemRangeByScore
Reddit API
- Remove content
- Send modmail
- Add mod notes
UI
- Devvit Blocks , Drift Log dashboard
Drift Scoring Engine
A deterministic system combining signals using a noisy-OR model:
[ \text{score} = 1 - \prod_{i}(1 - c_i) ]
This ensures:
- Weak signals reinforce each other
- No single category dominates
- Works without labeled training data
Security and Abuse Defenses
Built to match real-world adversarial behavior:
- URL canonicalization (Google Safe Browsing)
- Unicode UTS-39 homoglyph detection
- Trojan Source defense (CVE-2021-42574)
- Punycode decoding (RFC 3492)
- Typosquat detection (edit distance + deglyphing)
- Public Suffix List validation
- Link cloaking detection
- Dilution-resistant diffing:
[ \frac{|B \setminus A|}{|B|} ]
Challenges we ran into
No native AI support
- Only Gemini available, costly at scale
- Decision, go fully deterministic
Adversarial evasion
- Unicode tricks, hidden characters, cloaked URLs
- Required deep defensive engineering
Precision vs Recall
- False positives are worse than misses
- Auto-action requires ≥ 0.85 confidence
Real vs demo gap
- Example:
bit.ly/test-link(no scheme) should not trigger - Avoiding over-flagging was as hard as detection
- Example:
Accomplishments that we're proud of
- Identified and solved a previously unaddressed gap
- 135 unit tests, including adversarial cases
- Fully validated on a live subreddit
- Zero-config, free, and scalable
- Fully explainable decisions, no black boxes
What we learned
The biggest problems are not always smarter models, sometimes they are unwatched surfaces
Deterministic systems can outperform AI in:
- Reliability
- Cost
- Explainability
Moderators trust tools that show their reasoning
Precision is more important than raw capability in moderation systems
What's next for Tripwire
- Link rot and domain takeover detection
- Sleeper account behavior tracking
- Per-mod accountability analytics
- Optional AI semantic drift detection (opt-in, non-critical path)
- Score calibration using real moderator feedback
Built With
- devvit
- devvit-blocks
- reddit-developer-platform
- redis
- typescript
Log in or sign up for Devpost to join the conversation.