Flair Enforcer

Inspiration

Moderating a subreddit is unpaid, thankless work — and one of the most tedious parts is flair management. Every day, thousands of moderators across Reddit manually sort posts into the right categories because existing tools simply aren't smart enough to do it for them. Reddit's native AutoModerator can assign flairs based on simple keyword matches, but it breaks the moment someone phrases something slightly differently. A rule that matches "help needed" won't catch "need some guidance" or "could someone assist" — and writing regex patterns for every possible phrasing is a losing battle. Third-party tools like Toolbox require complex JavaScript snippets that only power-users can write, and even then they don't learn or improve over time.

Meanwhile, large subreddits like r/AskReddit, r/Technology, r/gaming, and r/worldnews process hundreds of posts per hour. In popular communities, a moderator can spend 30–45 minutes per session just sorting new posts into the correct flair categories. Multiply that across multiple moderation sessions per day, multiple days per week, and you're looking at hundreds of hours per year spent on a task that should be automatable. And it's not just about time — it's about consistency. Human moderators get tired, distracted, and inconsistent. A post that gets flaired correctly at 9 AM might get miscategorized at midnight when the mod is half-asleep.

We asked ourselves a simple question: what if a mod tool could read a post, understand what it's about, and assign the right flair — while getting smarter every time a moderator corrects it? That question became Flair Enforcer.

The inspiration came from our own experience moderating communities. We watched fellow mods spend 30+ minutes per day just sorting new posts into flair categories. That's roughly 180 hours per year of repetitive, low-value work that a machine learning system could handle. We wanted to build something radically different from existing tools — something that requires zero ML expertise to set up, works out of the box with just a few keyword rules, and genuinely improves itself over time through the simple act of moderators doing what they already do every day: fixing wrong flairs.

The Reddit Mod Tools Hackathon was the perfect catalyst. Reddit is investing heavily in the Devvit platform, and we saw an opportunity to build something that showcases what's possible when you combine traditional NLP techniques with modern platform primitives like KV storage, event triggers, and custom post types. We were particularly excited about the "migrated apps" track — the idea of taking functionality that previously required browser extensions or external bots and reimplementing it natively on Reddit's infrastructure. Flair classification is one of the oldest moderation problems on Reddit, and we wanted to show that a Devvit-native approach could be dramatically better than the old-school bot approach.

What it does

Flair Enforcer is a Devvit app that automatically assigns post flairs using a multi-stage classification pipeline. It's designed to be the only auto-flair tool a subreddit needs — replacing AutoModerator flair rules, external flair bots, and manual flair assignment all at once.

When a new post is submitted, the system immediately runs it through a 4-stage classification chain:

Stage 1: ML Classifier. A TF-IDF-inspired scoring engine tokenises the post title and body, compares the resulting terms against a weighted keyword index built from both manual rules and learned training data, and produces a ranked list of flair candidates with confidence scores. The tokenisation process strips all punctuation, lowercases every character, splits on whitespace, and filters out 80+ English stop-words ("the", "is", "a", "and", etc.) to focus only on meaningful content terms. Each candidate flair receives a score based on the sum of matched term weights, normalised by the square root of the total token count (a standard TF-IDF dampening technique that prevents longer posts from dominating). The top candidate is selected if its confidence score exceeds a configurable threshold — which moderators can set anywhere from 0.05 (very aggressive) to 0.95 (very conservative).

Stage 2: Keyword Fallback. If the ML stage doesn't meet the confidence threshold, the system falls back to exact keyword/phrase matching. Moderators define rules like "if post contains 'help needed', assign Help flair" through a simple menu form — no regex knowledge required. Each keyword rule has a weight parameter (1–10) that controls how strongly it influences the ML stage. A keyword with weight 5 will produce a much stronger signal than one with weight 1, making it possible to express relative importance between different classification signals.

Stage 3: Regex Fallback. For advanced use cases, moderators can define regex patterns. This handles structured content like "[OC] Original Content" prefixes, bug report formats with specific field markers, or title templates that specific communities use. Each regex rule has a human-readable description so moderators can remember what each pattern does weeks after creating it. The regex stage validates all patterns at creation time, so a syntax error in a pattern won't break the entire classification pipeline.

Stage 4: Default Fallback. If nothing matches, an optional default flair is applied, ensuring every post gets categorised. Moderators can choose to leave this empty if they prefer unclassified posts to remain unflaired rather than receive a "catch-all" flair.

What makes Flair Enforcer fundamentally different from every other auto-flair tool is the learning loop. When a moderator manually changes a flair that was auto-assigned, the system extracts key terms from the original post title and associates them with the correct flair in a training corpus stored in KV. Over time, this training data makes the ML classifier progressively more accurate without any manual model retraining, API calls, or data export/import. The entire learning process happens silently in the background — moderators don't need to do anything differently from what they're already doing. They just fix wrong flairs (as they always have), and the system gets smarter.

The system also maintains a comprehensive statistics dashboard — accessible as a Devvit custom post type — showing:

Total posts processed — lifetime and per-day counts
Auto-assignment accuracy — percentage of auto-flairs that were NOT overridden by mods
Average confidence score — how confident the classifier is across all classifications
Top flairs by volume — visual bar chart showing which flairs are assigned most
7-day trend view — daily breakdown of auto-assigned vs total posts
Manual override count — how many times mods corrected the system
Unclassified count — posts that fell through all stages
Rules configured — number of active keyword and regex rules

This dashboard gives moderators something no other auto-flair tool provides: real-time visibility into the system's performance. If accuracy drops, mods can immediately see which flair categories are being overridden most and add targeted rules to fix the problem.

Additional features include:

Post type filtering — optionally skip image posts, video posts, or link posts
Subreddit exclusion — exclude specific subreddits from auto-flair
Modlog integration — every auto-flair assignment is logged to the subreddit modlog with the classification stage and confidence score
Daily auto-cleanup — a cron job runs at 3 AM UTC every day to prune classification events older than 90 days, preventing KV bloat
Master toggle — one switch to disable all auto-flair without losing configuration

How we built it

Flair Enforcer is built entirely on Devvit — Reddit's official developer platform — using TypeScript with the @devvit/public-api SDK (version 0.11+). We chose Devvit specifically because it provides native access to Reddit's event system, persistent storage, and UI primitives without requiring external servers or browser extensions. The app runs directly on Reddit's infrastructure, which means it's always available, always fast, and doesn't require moderators to install or configure anything beyond the initial subreddit install.

The architecture follows a clean modular design with eight source files, each responsible for a single concern:

types.ts — All TypeScript interfaces, type aliases, and constants. Defines the ClassificationConfig, KeywordRule, RegexRule, ClassificationResult, FlairScore, DailyStats, ClassificationEvent types, the KV_KEYS constant for KV store key prefixes, and a DEFAULT_CONFIG export that ensures the app works correctly on first install with zero configuration.

classifier.ts — The ML classification engine and intellectual core of the app. Implements a TF-IDF-inspired scoring algorithm in four phases: (1) tokenise input by stripping punctuation, lowercasing, splitting on whitespace, and filtering 80+ stop-words; (2) build a weighted term index from keyword rules and learned training data; (3) score each flair candidate by summing matched token weights normalised by the square root of total token count; (4) rank candidates and compute confidence as the ratio of top score to sum of top two scores, so a strong single match yields high confidence (~0.9) while two near-equal matches yield lower confidence (~0.5).

fallback.ts — The 4-stage classification chain orchestrator. Runs each stage in sequence: ML classifier first, then keyword substring matching, then regex pattern matching, then default flair. Each stage returns a ClassificationResult with a stage field ('ml', 'keyword', 'regex', 'default', or 'none') that feeds into the stats system and modlog entries.

rules.ts — Configuration management and the learning system. Provides config load/save for KV storage. The trainFromCorrection() function powers the learning loop: when a mod overrides an auto-flair, it tokenises the post title and stores each token in a per-flair term frequency dictionary. Trained terms get a weight of 0.8 (vs 1.0 for manual rules), ensuring moderator-defined rules always take precedence over learned data.

stats.ts — Statistics tracking with daily granularity. Records every classification event, tracks manual overrides, and computes both daily and lifetime aggregates — including accuracy rates, average confidence, and top-flair rankings. All aggregation is done in application code since Devvit's KV store doesn't support server-side queries.

settings.ts — Devvit settings schema with eight native configuration options: master toggle, minimum confidence threshold, default flair ID, post-type filters, modlog logging, and learning enable/disable.

dashboard.tsx — The stats dashboard as a native Devvit custom post type. Renders summary cards (posts processed, accuracy, avg confidence), a 7-day trend view, a top-flairs bar chart, and a breakdown of auto/override/unclassified counts. Created via a mod menu item and updates in real-time.

main.ts — The entry point wiring everything together. Registers PostCreate trigger (auto-flair) and PostUpdate trigger (learning from mod corrections). Creates moderator menu items with forms for adding keyword rules, regex rules, and clearing all rules. Registers the daily cleanup scheduler. Implements install/upgrade handlers for initial config seeding. The PostCreate handler checks the master toggle, loads config, runs the full classification chain, assigns the flair, logs to modlog, records stats, and stores a lookup key for override detection — all wrapped in try/catch.

devvit.yaml — The app manifest defining the app name, version, Reddit owner/repo, and enabled service triggers.

All persistent data lives in Devvit's KV store — no external databases, no Redis, no API calls to third-party services. The data model uses three key namespaces: fe:config (single JSON object for classification settings), fe:stats:{YYYY-MM-DD} (one JSON object per day for aggregated stats), and fe:trained:{flairId} (one JSON object per flair for learned term frequencies). Individual classification events are stored as fe:event:{id} with a reverse lookup via fe:post:{postId} → event ID. The entire data model is designed to be iterable by prefix and garbage-collectable by age.

Challenges we ran into

Challenge 1: Building ML without external APIs. The single biggest challenge was building a classification system that works well without access to external ML services or large language model APIs. Devvit apps run in a sandboxed environment that doesn't provide network access to external AI services, so we couldn't simply call OpenAI or Gemini for text classification. This forced us to implement our own scoring system from scratch in pure TypeScript — no numpy, no scikit-learn, no Python ML libraries. Getting the weighting right took several iterations. Our first version used raw term frequency (count how many rule-terms appear in the post), which heavily biased toward longer posts — a 500-word essay about programming would score higher than a 20-word question about Python simply because it had more tokens. We solved this by normalising scores against the square root of token count, a standard TF-IDF dampening technique borrowed from information retrieval theory. The second iteration had the opposite problem: posts with rare but important terms weren't scoring high enough because the normalisation was too aggressive. We tuned it by experimenting with different normalisation functions (linear, log, sqrt) and settled on sqrt as the best balance between length-independence and term-importance preservation.

Challenge 2: The learning loop design. We needed the system to learn from moderator corrections, but faced a fundamental design tension: store raw post content (privacy risk, storage bloat) vs. store nothing (no learning possible). Our solution: extract and store only the tokenised terms, not the original text. Each flair maintains its own term frequency dictionary in KV storage (keyed as fe:trained:{flairId}), and during classification these trained terms merge with manual keyword rules at a weight of 0.8 (vs 1.0 for manual rules). This ensures a moderator's explicit rule always overrides a weak learned association, while still allowing learned data to fill gaps. The term frequency counts also enable future improvements: TF-IDF weighting, stop-word filtering on training data, or term decay for terms that haven't been reinforced recently.

Challenge 3: Devvit KV store limitations. Devvit's KV store is simple and reliable, but it's key-value only — no queries, no indexes, no aggregation, no transactions. Building the statistics system required us to maintain pre-aggregated daily summary records and perform all aggregation in application code. The getLifetimeStats() function iterates over every daily key to compute totals, which works well for 90 days of data but could become slow if we stored years of history. We mitigated this with the 90-day auto-cleanup scheduler and efficient JSON serialisation (storing only the fields we need, not the full ClassificationEvent). Another KV challenge was atomicity: when recording a classification, we need to update both the daily stats object AND store the individual event AND create the post→event lookup key. If the app crashes between writes, we could end up with inconsistent data. We handled this by ordering the writes from most to least critical (stats first, lookup last) and designing all read paths to gracefully handle missing data.

Challenge 4: PostUpdate trigger noise. Devvit fires PostUpdate for ANY post edit — not just flair changes. We needed to distinguish between "a mod changed the flair" (trigger learning) and "a user edited their post" (ignore). Our strategy: when we auto-assign a flair, we store the result keyed by post ID (fe:post:{postId}). On PostUpdate, we compare the current flair with what we assigned. If they differ, it's a mod correction → trigger learning. If the same or no flair, skip. This O(1) approach correctly handles edge cases like a mod changing a flair twice (the second change is ignored because the original is already marked overridden).

Challenge 5: Flair template ID discovery. Reddit's flair system uses opaque template IDs (strings like flair-id-123) rather than human-readable names. Moderators need to know the exact template ID to create classification rules. We mitigated this by making the flair ID field a required string input in the forms, but a future improvement would be to add a flair picker that fetches the subreddit's available flair templates and presents them as a dropdown. This is a Devvit platform limitation rather than a code bug — the current API doesn't provide a "list flair templates" method, so we'd need to scrape or cache them.

Accomplishments that we're proud of

Self-improving ML in a sandbox. We're most proud that Flair Enforcer provides genuine self-improving ML classification running entirely within Devvit's sandbox — no external APIs, no model hosting, no Python dependencies, no GPU requirements. Everything is pure TypeScript running on Reddit's infrastructure. The learning loop genuinely works: after a moderator corrects just 10–20 posts, you can see the confidence scores improve for similar future posts. We proved that you don't need a massive ML infrastructure to build something that learns — a well-designed term frequency system with good data flow is surprisingly effective. This is a fundamentally different approach from every other auto-flair tool on Reddit, and we believe it demonstrates the potential for intelligent, self-improving moderation tools built entirely on Devvit.

Real-time stats dashboard. The stats dashboard gives moderators something most auto-flair tools completely lack: visibility into the system's performance. Knowing that your accuracy is 87% or that a specific flair category has a 40% override rate is immediately actionable information. A moderator can look at the dashboard, see that the "Discussion" flair is being overridden 60% of the time, and immediately add a few keyword rules to improve it. This feedback loop — classify → measure → improve — is the core of the Flair Enforcer value proposition, and the dashboard is what makes it tangible. Building it as a native Devvit custom post type means it feels like part of Reddit rather than an external tool.

4-stage fallback chain. Most auto-flair tools use a single classification method — if it fails, the post goes unflaired. Our chain ensures that even if the ML classifier is uncertain, the keyword and regex stages provide multiple opportunities to get the right answer. In practice, this means the system catches a much higher percentage of posts than any single-method approach. During testing, we found that the ML stage alone handles about 65% of posts, keyword fallback catches another 20%, regex catches about 10%, and the default handles the remaining 5%. The staged architecture also makes it easy to add new classification methods in the future — you just add a new stage to the chain without modifying the existing ones.

Moderator-friendly UX. Adding a new classification rule requires just two fields (keyword + flair ID) through a native Devvit form — no JSON editing, no regex syntax for basic rules, no configuration files. We believe mod tools should empower moderators, not require them to become developers. The settings UI uses Devvit's native form system, which means it works on mobile, desktop, and the Reddit mobile app without any additional work. We deliberately avoided building a custom HTML settings page in favour of native Devvit forms, sacrificing visual customisation for reliability and cross-platform consistency.

Zero-dependency architecture. The entire app has exactly one runtime dependency: @devvit/public-api. No database drivers, no HTTP clients, no ML libraries, no logging frameworks. This makes the app incredibly easy to install, update, and maintain. It also means there's essentially zero attack surface — the app can't leak data to external servers because it doesn't make any external requests. For a tool that handles potentially sensitive moderation data, this security-by-design approach is a significant advantage.

What we learned

Simple algorithms + good training data > complex algorithms + no training data. This was our biggest insight from the project. Our TF-IDF scorer with just 20 learned corrections produces better results than a naive keyword matcher with 200 rules, because the learned data captures the actual language patterns that community members use. A keyword rule for "programming help" won't match "I'm stuck on this code" — but a learned association between "stuck", "code", and "Programming Help flair" will. This principle — that the quality of training data matters more than the sophistication of the algorithm — is well-known in ML research, but experiencing it firsthand on a real moderation problem was powerful. It also validated our decision to invest heavily in the learning loop rather than trying to build a more complex classifier.

Devvit KV store: simple but requires careful data modelling. We learned a lot about the Devvit platform's KV store. Its simplicity is both a strength and a limitation — it's incredibly easy to use (just get and put), but the lack of query capabilities means you have to design your data model carefully from the start. The key decisions we made: (1) use prefix-based key namespaces (fe:stats:, fe:trained:, etc.) to enable efficient bulk reads via getByPrefix; (2) maintain pre-aggregated daily summaries rather than computing them from raw events on every read; (3) use a reverse lookup key (fe:post:{postId}) to enable O(1) override detection without scanning all events. We also learned that KV values have size limits, so we store only the essential fields in each object rather than full copies of the input data.

Moderators have very different needs from developers. This was an important product lesson. Features that seem technically interesting (like confidence score heatmaps, classification latency tracking, or ROC curves) are less valuable than simple things like "show me which flair categories are being overridden most." We iterated the dashboard design multiple times based on this understanding, eventually settling on the current layout that prioritises actionable metrics over vanity statistics. The three numbers every moderator cares about: (1) how many posts did we process, (2) what percentage did we get right, and (3) which flairs need more attention.

Graceful degradation is non-negotiable. On day one, with zero training data and no rules, the system should still work — even if it just assigns a default flair or skips classification entirely. No errors in the modlog, no broken dashboard, no KV store pollution. Every edge case (deleted posts, empty titles, crossposts, mod-only posts, posts from banned users, posts in quarantined subreddits) needs to be handled silently. We wrapped every trigger handler in try/catch blocks and designed every read path to return sensible defaults when data is missing. This defensive programming approach added maybe 20% more code but made the app dramatically more reliable in practice.

Tokenisation is the unsung hero of NLP. Before this project, we underestimated how much of classification quality comes down to tokenisation. Our stop-word list grew from 30 words to 80+ as we tested against real Reddit posts. We discovered that Reddit-specific terms like "im", "ive", "dont", "ive", "doesnt" need special handling. We also learned to preserve hyphenated terms and slashes (like "iOS/Android") since they carry meaning that would be lost if we stripped all non-alphanumeric characters. The tokeniser is arguably the most important 30 lines of code in the entire app.

Event-driven architecture requires careful state management. Devvit's trigger system is powerful but stateless — each trigger invocation is a fresh execution context with no memory of previous runs. This forced us to be very deliberate about what state we store and when. The pattern we settled on: store the minimum state needed to make each decision (classification result for override detection, daily stats for the dashboard, trained terms for the classifier) and use key prefixes to enable efficient bulk reads. We also learned that trigger handlers should be as fast as possible since Reddit may throttle or queue them during high-traffic periods.

What's next for Flair Enforcer

Short term (next 2 weeks): We plan to add a bulk training feature that lets moderators upload a CSV of historical post titles and their correct flairs to bootstrap the classifier without waiting for organic corrections. This would dramatically improve accuracy on day one for large subreddits that have thousands of historical posts with correct flairs already assigned. The implementation is straightforward — parse the CSV in a form handler, extract tokens from each title, and write them to the same trained-term KV keys that the learning loop uses. We'd also add a "training status" section to the dashboard showing how many training examples exist per flair.

Short term: A flair picker UI improvement for the rule creation forms. Currently, moderators need to know the exact flair template ID (a string like flair-id-123). We want to add a flair selection step that fetches the subreddit's available flair templates and presents them as a searchable dropdown. This would eliminate the most common source of configuration errors and make the tool accessible to moderators who aren't technically savvy.

Medium term (next 1–2 months): We want to implement flair suggestion mode — instead of auto-assigning, the system would add a comment or modmail suggesting a flair and letting the moderator approve or reject it. This "human-in-the-loop" mode would be ideal for communities that want the ML benefits without fully automated assignments. The implementation would use Devvit's comment API to post a sticky mod comment with approve/reject buttons, with the button clicks handled by a form submission that either confirms or rejects the suggestion.

Medium term: A weekly flair health report that runs automatically and proactively notifies moderators when accuracy drops below a configurable threshold (say, 75%) or when new trending topics are being consistently misclassified. This would use the Devvit scheduler to run a weekly analysis job that compares the current week's stats to the previous week and sends a modmail summary if any metrics have degraded significantly. This turns Flair Enforcer from a passive tool into an active moderation assistant that alerts mods to problems before they become critical.

Medium term: Multi-flair support — allowing the system to assign secondary flairs or flair combinations. Some subreddits use both a topic flair (e.g., "Technology") and a content type flair (e.g., "News", "Discussion", "OC"). We'd extend the classification result to return multiple flair candidates and assign them based on configurable priority rules.

Long term (3–6 months): Cross-subreddit flair models. If multiple subreddits use Flair Enforcer and opt in to share their trained term data (anonymously, without any post content — just term→flair associations), the classifier could leverage community-contributed training data from similar subreddits. A gaming subreddit's classification model could help a new gaming community get 80%+ accuracy on day one. This "federated learning" approach respects user privacy while still enabling knowledge sharing. The implementation would use a shared KV namespace or a simple API endpoint that aggregates anonymised term data across participating subreddits.

Long term: Integration with Reddit's content understanding APIs as they become available on Devvit. If Reddit exposes embeddings, text classification, or topic modelling as platform features, Flair Enforcer is architected to swap in a more powerful backend without changing the fallback chain, settings UI, or stats dashboard. The classifier module is already isolated behind a clean interface — replacing it with a Reddit-native classifier would require changes to only one file.

Long term: Support for comment flair and user flair in addition to post flair. The classification pipeline is generic enough to work with any text input, so extending it to classify comments (for automated comment moderation) or users (for automated user flair based on posting history) is a natural extension. This would make Flair Enforcer a comprehensive content classification platform rather than just a post flair tool.