RageBaiter

Background

Humans do not process information like an objective arbiter of information. This is well known, however most people do not understand the extent of the biases that plague the human mind. Here are some of the few ones:

  • Confirmation bias pushes people to search for and interpret evidence in ways that support what they already believe.
  • Anchoring makes early information “stick,” so later evidence is insufficiently weighted even when it should change someone’s view.
  • Motivated reasoning means people often use reasoning to defend an identity-consistent conclusion, not to find the most accurate one.
  • When beliefs are strong and identity-laden, mixed evidence can actually increase confidence and push people further apart (“biased assimilation” and “attitude polarization”).

This gets worse when the information is partisan, because partisan cues convert “a claim about the world” into “a claim about my team.” It means that the normative beliefs that people have about the world begin to determine their positive beliefs. Once that happens, disagreement about policy cannot be resolved since the groups that are arguing are not doing so based upon the same set of facts.

Problem Statement

Over the last decade, the U.S. information environment has moved toward harder social sorting and more hostile partisanship.

  • Pew Research has found that Republicans and Democrats are more divided and partisan antipathy is deeper and more extensive than in prior decades.
  • Large majorities of Americans now say Republican and Democratic voters cannot agree on basic facts, not just policies.
  • Political polarization is often “affective” (dislike/loathing of the other side) as much as it is ideological.

As Pew has found this division is not evenly distributed; it increasingly shows up as fractures across demographic lines. Pew’s recent party affiliation data shows a persistent, sizable gender gap in partisan alignment.

This effect is magnified by online platforms (most notably X / Twitter) that amplify this dynamic via selective exposure (people gravitating to agreeable sources) and algorithmic curation. The partisan confirmation bias applies even if an individual is trying to be objective. The solutions that have been implemented (Community Notes) can be hijacked by a group of partisans and generally take too long to be put up when most of the misinformation has already been done.

The practical result is a vicious loop that radicalizes the individual and increases partisanship:

  1. People see content that confirms their priors.
  2. They feel more certain, more morally justified, and less charitable.
  3. They engage harder (quote-tweets, dunking, outrage).
  4. The system learns that outrage works and shows more of it.

The Solution

We built RageBaiter, a Chrome extension that transforms your social media feed from a passive consumption stream into a measurable, active decision system.

Instead of trying to "censor" misinformation, RageBaiter acts as a cognitive firewall. It uses real-time vector analysis to detect when a user is being psychologically manipulated and triggers a lightweight, Socratic intervention exactly when it matters most.

Basic Summary

The "Silent Observer" (Vector Tracking)

RageBaiter maps all the text you read. The extension assigns every tweet a coordinate on a 3D political compass (Economic, Social, Cultural). Simultaneously, it tracks the user's own position based on their interaction history. This allows the system to mathematically define an "Echo Chamber"; it knows when a piece of content is aligned with a user’s prior.

The Detection Engine

The system scans for "High-Voltage" content: tweets that combine extreme emotional valence (rage) with rhetoric and logical fallacies. When the user encounters a post that is both logically unsound and reinforcing of their priors, RageBaiter activates.

The Intervention (The "Cut the Line" Moment)

Rather than blocking the post, RageBaiter overlays a Context Analysis Card. This interrupts the "dopamine loop" of outrage and forces a moment of friction.

Real-World Example

Consider a viral, racially charged tweet that relies on statistical manipulation to trigger emotional anger and manipulate the biases of the user. A standard user might retweet it instantly. A RageBaiter user sees this:

Matt Walsh: Young black males are violent to a widely, outrageously disproportionate degree. That's just a fact. We all know it. And it's time that we speak honestly about it, or nothing will ever change.

RageBaiter Intervention:

⚠️ LOGIC FAILURE DETECTED: Confounding Variable

The Mechanism : The system identifies "Spurious Correlation." The tweet presents a statistical disparity as an inherent racial trait, deliberately ignoring the stronger predictive variable: Socioeconomic Status.

The Data Check : RageBaiter injects the missing context: "When researchers control for poverty, education, and neighborhood stability, the 'racial gap' in violence rates shrinks or disappears entirely. Violence correlates most strongly with economic disadvantage, not race."

The Socratic Challenge : "If poverty and proximity to crime are the strongest predictors of violence regardless of race, why does this post focus exclusively on the racial identity of the subject? Is the goal to solve violence, or to assign blame?"

The Result

The user is presented with a choice: "Take the Bait" (acknowledge the bias but engage anyway) or "Cut the Line" (reject the manipulation). This simple action breaks the automated feedback loop of radicalization, turning a moment of mindless outrage into a moment of critical reflection.


System Overview

Here’s the entire process laid out:

  1. Model the user (once).
  2. Passive tweet scraping (real-time, low friction).
  3. Local political filter (cost and latency gate).
  4. LLM analysis for bias and fallacies (with caching and guardrails).
  5. Echo-chamber detection via vector distance (decision engine).
  6. Socratic interventions rendered in the feed (minimal disruption, explicit user control).
  7. Longer-term adaptive loop (not fully detailed in the snippet).

1) Model the User (Once)

Users take an 18-question quiz and get a 3D political vector:

  • Economic: redistribution/active state ↔ market/small state
  • Social: autonomy ↔ tradition/order
  • Populism: institutionalist ↔ anti-elite majoritarianism

There are 6 questions per axis, with 5-point Likert responses, normalized to roughly ([-1, +1]).

Optionally, the users can insert their results from the moral foundations quiz so that the model can take into account their temperament and pre-political leanings so that the Socratic response can be more personalized.

2) Passive Tweet Scraping (Real-Time, Low Friction)

Once the extension is active on Twitter/X, a content script continuously watches the feed for new tweets as the user scrolls. It uses DOM observers to detect tweet elements and only processes tweets that actually enter the viewport, reducing wasted work during fast scrolling.

When a tweet is detected, it extracts the minimum viable payload needed for analysis (tweet ID, text, author handle, timestamp, and basic engagement metrics) and deduplicates so each tweet is processed once.

This design is intentionally “passive”: it does not require users to change how they browse, and it avoids any heavy computation in the page context beyond detection and extraction. The extracted tweet payload is then sent to the service worker via extension messaging to keep orchestration and network calls out of the UI thread.

3) Local Political Filter (Cost and Latency Gate)

Most tweets are not political, so the system runs a local keyword filter before any LLM call. The filter uses a large political dictionary (200+ terms across elections, policy, parties, governance, activism, geopolitics), supports hashtags, normalizes common obfuscations, and exposes a sensitivity setting (low/medium/high) that controls the match threshold.

This gate is explicitly performance-critical: it is designed to run in under 1 ms per tweet and return both matched keywords and a confidence score.

Net effect: the extension can monitor the feed in real time without turning “scrolling Twitter” into “sending everything to an API.”

4) LLM Analysis for Bias and Fallacies (With Caching and Guardrails)

If a tweet passes the local filter, the service worker triggers analysis. Before calling any model, it checks cache in two layers: an in-memory LRU for instant repeat hits and a Supabase-backed cache keyed by tweet ID for broader reuse across users and sessions.

Cached analyses have a TTL (24 hours, configurable). On cache hits, the system can return results fast enough to feel instantaneous; on misses, it performs analysis and writes the result back to cache.

When the system does call the internal analysis model (Gemini), the output is constrained to a structured JSON shape containing:

  • A 3D political vector
  • A list of detected logical fallacies
  • Topic classification
  • Confidence

The response is schema-validated (so malformed generations do not break the pipeline), retried with exponential backoff, and fails “open” by skipping intervention rather than blocking browsing.

This is also where your quantitative success targets matter: you are explicitly aiming for cache hit rate > 60% (viral tweets) and end-to-end latency targets of < 3 seconds on cache miss and < 100 ms on cache hit.

5) Echo-Chamber Detection via Vector Distance (Decision Engine)

Each user has a political “profile vector” in a 3D space (Social, Economic, Populist). A tweet, once analyzed, has its own 3D vector. The engine computes Euclidean distance between the user vector and the tweet vector to quantify alignment.

Interventions are then decided by threshold logic:

  • Strong alignment (small distance) is treated as “echo chamber.”
  • Mid-range alignment as “mild bias.”
  • Large distance as “diverse exposure” that needs no nudge.

The PRD’s decision engine also conditions on fallacy presence and severity, not just distance. For example, very close alignment plus detected fallacies escalates to a stronger intervention, while close alignment without fallacies stays subtle.

To prevent the tool from becoming an annoyance generator, the engine includes a cooldown so it cannot trigger interventions more than once per 30 seconds, and it logs a full decision trace for debugging and transparency.

6) Socratic Interventions Rendered in the Feed (Minimal Disruption, Explicit User Control)

When the decision engine says “intervene,” the content script injects UI directly into Twitter’s DOM. The design is tiered by intensity:

  • Critical: a visible highlight (yellow border) and an interactive Socratic popup.
  • Medium: an orange border plus a “Bias Check” button injected below the tweet toolbar.
  • Low: a subtle icon badge that signals “this aligns strongly with you,” without interrupting the reading flow.

The popup itself is deliberately practical: it shows the fallacy name and a plain-English explanation, then asks a Socratic question (generated via the user’s connected LLM subscription or a template fallback).

It also asks for lightweight feedback (“Good Point” vs “I Agree with Tweet”), which supports the broader system goal of refining the user profile over time (even if that adaptive loop is formally step 7 in the overview).

Because Twitter’s UI is fragile and constantly changing, the intervention UI is designed not to break layout or interaction, supports dark/light mode, and uses Shadow DOM to isolate styling so extension CSS does not leak into Twitter (or vice versa).


Tech Stack

  • Extension runtime: Chrome Extension Manifest V3.
  • Language: TypeScript (~5.6).
  • Build tooling: Vite + CRXJS (vite ^6, @crxjs/vite-plugin ^2) for extension bundling and dev workflow.
  • UI: React with Tailwind CSS and Zustand for state.
  • Core architecture: Content script (Twitter DOM scrape + keyword filter + UI injection) plus service worker (orchestration, cache checks, vector math, API calls) plus a React side panel for quiz/settings/debug.
  • Backend: Node.js + Hono (edge-deployable).
  • Database + vector storage: Supabase Postgres with pgvector.
  • Caching: Supabase cache plus in-memory LRU (service worker) with TTL and request coalescing.
  • LLMs: Internal analysis via Google Gemini; user-facing Socratic generation via user subscription (OpenAI / Anthropic / Perplexity).
  • Quality and delivery: Vitest + React Testing Library + Playwright; ESLint + Prettier; GitHub Actions CI/CD.

Challenges

1) Twitter/X DOM Volatility and Scraping Reliability

Twitter’s feed structure changes frequently, and tweets can appear as threads, quote tweets, or media-only posts. The scraper must handle dynamic updates, route changes, deleted tweets, and rapid scrolling without breaking or leaking memory.

The PRD addresses this with observer-based detection, viewport gating, deduplication, cleanup on navigation, and “defensive selectors with fallbacks.”

2) Performance Constraints Inside the Browser

A naive implementation would freeze scrolling and destroy UX. The pipeline must be non-blocking, avoid over-processing, and keep tweet handling bounded under bursty conditions.

This is why political filtering is designed to run in < 1 ms per tweet and why orchestration supports concurrent processing with queue management and error isolation.

3) Cost Control and Latency (LLMs Are the Bottleneck)

If every “maybe political” tweet hits a model, costs and rate limits explode. The PRD’s answer is multi-layer caching (LRU + Supabase), TTL invalidation, and request coalescing so viral tweets do not trigger redundant calls.

Even then, cache-miss latency must stay under a few seconds to make interventions feel connected to the content, with explicit success targets for cache-hit and cache-miss performance.

4) LLM Reliability, Structured Output, and Safe Failure Modes

LLMs can return malformed JSON or out-of-range values. The analyzer therefore needs schema validation, clamping, retries, and a “don’t block the user” fallback path.

5) Avoiding User Annoyance While Still Being Effective

Interventions that trigger too often train users to dismiss the tool. The PRD mitigates this with tiered interventions (subtle to critical), a cooldown (max once per 30 seconds), and UI requirements that preserve Twitter’s layout and interaction model.


Impact

Takeaway: The last presidential election in the United States was decided by around 230,000 votes in Michigan, Pennsylvania and Wisconsin.

The realistic promise of this project is not “ending polarization.” It is reducing the number of times people get pushed into tribal reflexes by content that is both reinforcing and poorly reasoned. We are targeting the narrow moment where intervention is most likely to work: when a post is very close to a user’s current beliefs (echo-chamber reinforcement) and the argument quality is low (fallacy flags). That combination is where people tend to over-trust.

This is not speculative. Recent controlled research shows that conversational AI can meaningfully move beliefs that are typically considered sticky. In a Science study, brief, tailored dialogues with an AI system reduced participants’ belief in a conspiracy theory they personally endorsed by about 20%, with effects that persisted for at least two months and generalized beyond the single conspiracy discussed.

In parallel, emerging experimental work suggests AI-driven persuasion can reduce polarization on specific issues, with reported reductions on the order of 10 to 20 percentage points depending on the setup and topic, and with effects that can persist for weeks.

Our extension is designed to operationalize these findings in the feed environment by scaling two mechanisms:

  1. Precision targeting (only intervene when the content is both close to the user’s priors and logically weak).
  2. Personalized dialogue (Socratic prompts, with an optional deeper conversation when the user opts in).

By the Numbers (10 Million Users)

If 10 million people install the extension, a plausible adoption funnel looks like this:

  • 6 million monthly active users (people actually using the feed with the extension on).
  • 3 million encounter at least one “critical” trigger per month (high echo reinforcement plus low-quality reasoning).
  • 1.5 million engage with at least one prompt per month (click, expand, or respond).
  • 1 million complete at least one “Deep Dive” conversation per year for high-stakes claims (a small commitment, but still only 10% of installs).

If the Deep Dive experience reaches anything close to the benchmark, then on the order of 1 million people would see a meaningful reduction in the strength of a conspiracy belief they previously endorsed, on average around 20%, with effects that can persist for months rather than minutes. Even if only a fraction of these users shift from “confident believer” to “uncertain,” that is still a large absolute number of people becoming less certain and less share-prone in the moments that matter most. For reference, the last presidential election in the United States was decided by around 230,000 votes in Michigan, Pennsylvania and Wisconsin.

For partisanship, the upside is broader than belief change. Depolarization interventions are usually modest and can decay quickly, which is why most one-shot “be nice” nudges do not scale. Our advantage is repetition plus targeting. If even 1 to 2 million users per year complete issue-focused Deep Dives and experience the kind of moderation effects seen in recent AI depolarization experiments, it has a direct impact on behaviour. Fewer hostile quote-tweets, and more willingness to treat disagreement as uncertainty instead of betrayal.

The top-line claim is simple: at 10 million users, this tool can plausibly shift conspiracy thinking and partisan reflexes for a meaningful minority, and that minority is large enough to matter (as Canadians have come to know since the last presidential election).

Built With

Share this project:

Updates