Inspiration

Short-form videos (TikTok, Instagram Reels, YouTube Shorts) have become the default way people absorb “news,” but it’s optimized for speed + emotion, not truth. With deepfakes, voice clones, and out-of-context clips, misinformation is spreading faster than ever.

A careful and intentional human could do the right thing: Google the claim, check the channel, find the original clip, and compare sources. But almost nobody has the time, energy, or incentive to do that for every video they scroll past.

Investigative journalists do this work professionally, but they have limited bandwidth and can’t cover everything that goes viral. This constraint can be overcome with AI.

So we built Sauce, please to make that investigative workflow automatic. By simulating a rigorous investigative newsroom with a team of specialized agents, Sauce, please aims to deliver WSJ-level verification in real time for every short-form video you watch.

What it does

Sauce, please is a Chrome extension that adds a real-time “investigative report” sidebar to short-form videos (currently YouTube Shorts, designed to expand to TikTok and Instagram Reels).

Every time you open a Short, Sauce automatically generates a report with two parts:

1) Video Risk: Is the video itself manipulated?

We check whether the video artifact is a truthful representation of what happened.

  • Detects signals of AI/deepfake / synthetic audio and other manipulation
  • Flags clipping/edits that may indicate selectively presented footage
  • Reports confidence (low/medium/high) and what signals we used
  • We avoid overclaiming: if it’s a real person speaking, we don’t call it “fake”. We only report manipulation risk signals.

2) Context Risk: Even if real, is it misleading?

We verify whether the story being told is accurate.

  • Extracts the video’s key claims (what viewers are likely to believe)
  • Checks whether claims are false, missing key context, or misrepresent who/what/when/where
  • Provides claim -> evidence -> sources, so users can inspect the receipts and decide

At the top, Sauce shows a simple summary: No Risk / Video Risk / Context Risk / Both. Sauce will make a clear claim only when it can back it up.

Every conclusion comes with evidence and citations that the users can read themself, and they're free to disagree or reach a different interpretation. We’re not here to push a viewpoint. We’re here to surface receipts and context so the user can make their own judgment.

How we built it

We built Sauce as a FastAPI-based analysis pipeline designed to feel like a WSJ-style investigative newsroom, except the “reporters” are specialized AI agents working in parallel. Instead of a single model guessing, we split the job into roles (reporters, researchers, background-checkers) and then use an editor agent to synthesize an evidence-backed report.

Backend flow (end-to-end) 1) Frontend sends a batch of Short URLs to the backend. 2) For each URL, we check the cache and skip anything we’ve already processed. 3) We download a lightweight clip (first 30 seconds, 144p, with audio) to minimize cost and latency. 4) For each video, we run three investigations in parallel:

  • Transcription (audio to text): Modal + vLLM (GPU-backed)
  • Channel background check: scrape channel metadata + recent uploads, then assess credibility
  • Multimodal video analysis: Gemini analyzes visual content and manipulation/propaganda signals

5) An Editor-in-Chief agent (GPT synthesis) merges all evidence into a structured report:

  • Mismatch level
  • Video Risk (AI/manipulation confidence + why)
  • Context Risk (claims + evidence + sources)
  • Presentation Risk (framing/propaganda techniques, when relevant)

6) Results are stored in cache.json, so repeated videos can return instantly.

Engineering for speed

Short-form video is fast, so the backend is built for throughput:

  • Parallel downloads (threaded)
  • Parallel analysis across many videos at once
  • Per-video parallel subtasks (transcription + channel + semantic analysis)
  • Caching to avoid re-processing the same viral clips

This is how Sauce can keep up with scroll speed: we reduce input size, parallelize aggressively, and treat evidence-gathering like a newsroom pipeline.

Challenges we ran into

Latency vs. depth: Real verification takes time, but Shorts move fast. We had to engineer a pipeline that stays thorough while still feeling immediate to the user.

Separating “fake” vs. “misleading”: Many videos are real but framed deceptively. That’s why we split the report into Video Risk (manipulation/deepfake signals) and Context Risk (claims + missing context).

Avoiding hallucinated certainty: A credibility tool can’t confidently “declare truth” without receipts. We built the system to cite evidence and to output Uncertain when the proof is weak.

Self-hosting - There were a lot of hacks and knobs and dials I had to experiment with to get some configs and setups to work, but it was super interesting to see how Modal abstracted away a lot of the tediousness of self-hosting - Gabe

Accomplishments that we're proud of

  • A fully functional, shippable product: Sauce is a shippable Chrome extension with a real-time sidebar experience, not just a concept demo.

  • WSJ-style “newsroom” architecture: We simulated an investigative journalist workflow using a team of specialized agents (transcription, channel background checks, multimodal video analysis) plus an editor agent that synthesizes an evidence-backed report.

  • Evidence-first outputs: We don’t just label content: we provide claim -> evidence -> sources, so users can verify and reach their own conclusions. We want to earn the trust of our users.

What we learned

  • Trust requires receipts. Users don’t trust black-box “true/false” labels. They trust sources, evidence, and a clear chain of reasoning.

  • Every moment matters. Users don't want to wait 2 minutes for facts. We prefetch and cache aggressively to create a seamless user experience.

  • “Uncertain” is a feature, not a failure. In verification, refusing to guess is often the most responsible output—and it preserves credibility.

  • Most misinformation isn’t “fake,” it’s framed. The biggest problem we saw was real clips used with missing or distorted context, which is why separating Video Risk from Context Risk matters.

  • Self hosting with Modal - I learned a ton about Modal's serverless self-hosting ecosystem, and how to integrate it with VLLM and HuggingFace Transformers - Gabe

What's next for Sauce, please?

Sauce is already a fully functional, shippable Chrome extension. Next, we want to turn it into the default trust layer for short-form video:

  • Ship v1 publicly: publish to the Chrome Web Store with onboarding and a waitlist for early users.
  • Expand beyond YouTube Shorts: bring the same Video Risk / Context Risk report to TikTok and Instagram Reels.
  • Make it faster and more real-time: push streaming progress + partial results to the sidebar (the backend already supports chunked outputs).
  • Improve evidence quality: stronger source ranking, better linking to original footage, and clearer “what would change our mind” explanations.
  • Go multilingual: support verification across languages so users aren’t limited to English-only sources.
  • Close the feedback loop: allow users to flag incorrect analysis and submit better sources, improving accuracy over time.

Long term, our goal is simple: the truth layer for every short-form video.

Built With

  • fastapi
  • google-gemini
  • modal
  • openai
  • perplexity
  • react
  • threadpools
  • vllm
Share this project:

Updates