Rhetoriq: The Narrative Forensics Engine

Inspiration

We live in an era of Cognitive Warfare. Modern media consumption isn't just about information; it's about coercion. Algorithms prioritize high-arousal content—rage-bait, doom-scrolling, and hyper-sensationalism—that hacks our dopamine loops to keep us engaged.

We have antivirus software for our devices, but we have no protection for our minds.

We realized that Fact-Checkers are insufficient because they only analyze what is said (the content). They completely miss how it is said (the form). A video can be "factually" correct but emotionally manipulative, using ominous music, rapid-fire editing, and logical fallacies to engineer a false sense of urgency.

Rhetoriq was born from a simple question: Can we use AI to reverse-engineer these signals and give the user their cognitive liberty back?

What it does

Rhetoriq is an adversarial AI engine that performs Multimodal Narrative Forensics. It doesn't just "watch" a video; it interrogates it.

Signal Analysis: It breaks down the video into three parallel data streams:
Audio: Detects BPM changes, music swells, and prosody (voice stress) used to induce anxiety.
Visual: Analyzes framing intensity (extreme close-ups), color grading, and cut frequency.
Linguistic: Identifies rhetorical devices and urgency markers.
Logic Forensics: It acts as a real-time logic tutor, flagging Reasoning Leaks—moments where the speaker uses fallacies like Ad Hominem, Strawman, or False Causality to mask a weak argument.
The Pressure Curve: It visualizes the "emotional pressure" of the video over time, allowing users to see exactly when they are being manipulated.
Forensic Player: Users can disable the "Audio Influence" (music/SFX) to hear the naked argument, often revealing how weak it truly is without the cinematic production.

How we built it

Rhetoriq is a "Mobile-First" forensic tool powered by Google Gemini 3 (Experimental).

The Brain: We leveraged Gemini 3 Flash Preview for its massive multimodal context window. Unlike older models that need video described to them in text, Gemini 3 can natively "watch" the frames and "listen" to the audio track simultaneously, allowing it to spot Visual Dissonance (when the video contradicts the audio).
The Backbone: A FastAPI (Python) backend handles the media processing pipeline. It uses FFmpeg to strip audio tracks and sample keyframes at 5-second intervals.
The Interface: A responsive, dark-mode dashboard built with Vanilla JS and Chart.js. We avoided heavy frameworks to ensure the tool runs smoothly on low-end mobile devices.
The Pipeline: We engineered a custom Threaded Streaming Architecture. Since video analysis is heavy, we decoupled the processing logic from the response stream, using a heartbeat mechanism to keep the connection alive on unstable 4G/5G networks.

Challenges we ran into

The "Hallucination" Drift: Early versions of the AI would lose track of time, analyzing the audio from minute 1:00 but attributing it to minute 0:30. We solved this by injecting Explicit Timestamp Anchors directly into the multimodal context window, forcing the model to synchronize its insights with the clock.
The Mobile Freeze: Mobile browsers are ruthless about killing idle connections. While the server was crunching data, the browser would assume the connection died and close the socket. We had to implement a Heartbeat Keep-Alive system and a Padding Protocol (injecting 8KB of invisible data) to force mobile carrier proxies to flush their buffers and keep the stream open.
Safety Filters: Forensic analysis of controversial political or social content often triggers AI safety rails. We had to carefully tune the safety settings to allow the AI to analyze harmful rhetoric without generating it.

Accomplishments that we're proud of

The Coercion Index: We successfully created a mathematical metric () that quantifies how "dangerous" a piece of media is. Seeing a video scored as "0.95 / CRITICAL THREAT" because of its reliance on fear-mongering is incredibly validating.
Audio Suppression: It is a simple feature, but toggling off the ominous background music in a propaganda video completely defangs it. It’s a powerful "Aha!" moment for users.
Real-Time Fallacy Tagging: Seeing the AI correctly identify a Strawman Argument the exact second a politician uses it felt like giving the user X-Ray vision for lies.

What we learned

Context is King: You cannot analyze a video frame-by-frame. The AI needs the entire context to understand that a joke in minute 1 becomes an insult in minute 3. Gemini's large context window was essential for this.
Silence is Expensive: On mobile networks, silence is interpreted as death. You must always keep talking to the client, even if you have nothing to say yet.
Form > Content: Propaganda isn't about what you say; it's about the rhythm, the music, and the cuts. AI is surprisingly good at detecting these stylistic fingerprints.

What's next for Rhetoriq

The ultimate goal is the Cognitive Defense Suite: integrating Rhetoriq with Epistemiq.

While Rhetoriq deconstructs the Form (how they are manipulating you), Epistemiq reconstructs the Fact (what is actually true).

Imagine a unified interface:

Rhetoriq flags a segment as "High Pressure / False Causality."
Epistemiq instantly triggers, scanning the web to surface primary sources, peer-reviewed data, and counter-evidence that debunks that specific claim.

Together, they move the user from Passive Consumption to Active Interrogation, creating a complete immune system for the information age.

Built With

chart.js
css3
docker
fastapi
ffmpeg
google-cloud-run
google-gemini
html5
javascript
opencv
python

Updates

Asparuh Kebonin started this project — Feb 08, 2026 10:33 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.