ModSentry

MODSENTRY
Architecture

Inspiration

Every active subreddit moderator knows the feeling: you ban a user for repeated rule-breaking, and three days later they're back on a new account with the same writing style, the same grievances, and the same targets. Reddit's built-in ban evasion filter is a black box — it misses obvious alts and flags innocents with no explanation. Existing third-party bots focus on spam and karma farming, not stylometric detection. There was no tool in the Devvit ecosystem that gave mods transparent, explainable evidence for ban evasion decisions.

ModSentry was built to change that.

What it does

When a mod bans a user, ModSentry silently captures a behavioral and stylometric fingerprint of that user — how they write, when they post, which subreddits they frequent, and how their account was created. Every new or low-karma account that posts in the subreddit is then scored against the fingerprints of every previously banned user.

When a strong match is found, mods receive an Evidence Card in modmail — a plain-English breakdown of exactly why this account looks like the banned user:

"Both users post almost exclusively between 02:00–05:00 UTC. Both start sentences with 'Honestly,' at 13% rate. Account created 4 days after the ban."

Mods reply with !ms-ban, !ms-clear, or !ms-watch — no new UI to learn. Every decision feeds back into the system.

How we built it

ModSentry is built entirely on Reddit's Devvit platform using TypeScript and Hono as the server framework, with Devvit-managed Redis for all storage.

The matching engine uses classical NLP — no LLM required:

Character n-gram histograms (n=3) capture writing style at a character level
Function-word frequency vectors capture unconscious word choice habits
Jensen-Shannon divergence on posting-hour and day-of-week histograms captures behavioral timing
Cosine similarity across aligned feature vectors (using key-union alignment to ensure correct dimensionality) scores the overall match

The scoring formula weights 16 features across stylometric, behavioral, and metadata dimensions, normalized to a 0–100 match score.

Fingerprints are stored as compressed JSON hashes in Redis, one hash per subreddit with user IDs as fields. A daily scheduler handles stats aggregation and auto-decay of stale fingerprints after 365 days.

Challenges

Modmail has no interactive elements. The original design called for inline action buttons on Evidence Cards. Devvit modmail doesn't support this — we had to redesign the interaction model around text command parsing (!ms-ban, !ms-clear, !ms-watch via onModMail trigger), which required building a pending-match lookup flow so the handler could find the right context when a mod replied.

Cosine similarity across heterogeneous vector spaces. The initial implementation compared n-gram vectors built from each user's own top-200 n-grams — meaning the two vectors lived in completely different dimensional spaces. The fix was alignVectors(), which unions the keys from both fingerprints and builds aligned vectors with zeros for missing entries. This was a subtle but critical correctness bug that would have produced garbage scores silently.

TypeScript declarations for Devvit subpath modules don't resolve cleanly in a local development environment — @devvit/web/server, @devvit/web/client, and @devvit/web/shared appear as empty directories in node_modules until the Devvit build pipeline runs. This made local type-checking unreliable and required careful configuration to separate the production tsconfig from the test tsconfig.

What we learned

Behavioral signals (posting hours, day-of-week patterns, inter-post gaps) are more robust than stylometric signals for short Reddit comments — a user writing 2-sentence comments doesn't give enough text for reliable n-gram matching, but their posting-time fingerprint is highly stable. The most reliable single signal is account creation timing relative to the ban date: an account created within 30 days of a ban, posting in the same subreddit, is the strongest prior available.