Inspiration

In an era of deepfakes and sensationalist clickbait, the line between reality and digital fabrication has blurred. We were inspired by the need for a "citizen's toolkit" that doesn't just watch a video but actually understands it. We wanted to move beyond simple keyword searches and provide a tool that can cross-reference what is promised in a thumbnail with what is actually delivered in the footage.

Key Pain Points Haunting the Digital Era: 1) Unstoppable Content Tsunami: Over 500 hours of video are uploaded to YouTube every minute. Manual fact-checking is no longer a human possibility; it’s an AI necessity. 2) The Trust Deficit: Deepfake content is exploding at an annual growth rate of 900%, with files surging from 500k in 2023 to over 8 million in 2025. 3) Clickbait Saturation: Up to 25% of high-traffic thumbnails use extreme sensationalism to hijack user attention. In a landscape where users decide to watch in just 1-2 seconds, misleading visuals steal millions of human hours daily.

What it does

Our browser extension acts as an intelligent overlay for YouTube. It uses Twelve Labs' Pegasus model to perform native video analysis—identifying exactly what is happening in the frames and audio. It then uses Gemini to analyze community sentiment through comments and fact-check claims against external data. Users get a "Credibility Score," a "Clickbait Meter," and a breakdown of Key Claims and thier Validity, Sentiments, Truthfulness, etc

How we built it

We devloped a multi-layered backend using Python and Flask.

1) Video Analysis: We utilized Twelve Labs' Pegasus model to handle deep video search and generative summarization. 2) Sentiment & Context: Gemini Pro was integrated to provide analysis of hundreds of comments and cross-verify video metadata. 3) Data Persistence: We used Vultr, Valkey for high-performance caching of analysis results, ensuring that repeated queries for the same video are served instantly. 4) Frontend: A JavaScript-based browser extension injects our custom UI directly into the YouTube interface for a seamless user experience.

Challenges we ran into

One of our biggest hurdles was handling the temporal nature of video. Most AI tools treat video as a series of static images, but we needed to understand the "flow" to detect clickbait. We also faced technical challenges with Unicode encoding when processing global YouTube metadata on Windows servers and managing the asynchronous nature of video indexing to provide a smooth user experience without long wait times.

Accomplishments that we're proud of

We are incredibly proud of achieving Video-Native Understanding. Unlike tools that rely solely on captions, our pipeline "sees" the video, allowing us to detect discrepancies between the thumbnail and the actual content. We also successfully built a high-performance cache system using Valkey that reduced our API overhead significantly.

What we learned

Building this project taught us the power of multimodal AI orchestration. We learned how to chain specialized models—using Twelve Labs for what it does best (video pixels) and Gemini for what it does best (language and reasoning). We also deepened our knowledge of building scalable APIs with FastAPI and Flask and the intricacies of browser extension development.

What's next for Project

We plan to expand our "Sponsor Detection" to include real-time alerts when a creator shifts into a promotional segment. We also want to implement a community-driven "Fact-Check" database, allowing users to contribute to the credibility score of viral videos, creating a decentralized shield against misinformation.

Built With

Share this project:

Updates