Inspiration
In an age where digital signals define our reality, we saw a world where the quietest distortions carry the loudest consequences. We watched as simple videos were weaponized, not through obvious face-swaps, but through the surgical misalignment of what we see and what we hear. We chose the name Mendacia, the Latin word for lies and deception, because it represents exactly what we aim to unmask. We were inspired by the Siren's Call, the deceptive melody that leads the unsuspecting toward wreckage. Modern manipulation is that siren which uses Emotional Amplification and Context Stripping to bypass our logic and strike at our fears. We built Mendacia because truth shouldn't be a luxury for experts, it should be a right for everyone. While others look for "deepfakes," we look for the "shallow-fakes".
What it does
Mendacia is a Multimodal Media Forensics Dashboard that exposes how media shapes perception. Unlike standard tools that merely label content as “fake", Mendacia performs Scene-Level Video Analysis to detect Cross-Modal Inconsistencies by producing a structured forensic breakdown of Visual-speech inconsistencies, Emotional amplification, Framing bias, Persuasion techniques and synthetic media signals. The app provides a three tiered journey of
- The home screen where users upload video for forensic scanning.
- A simplified overview for a general audience, featuring a Trust rating gauge and "red flag" cards to make propaganda risks clear.
- A robust forensic report which features a Manipulation Radar Chart, a structured Taxonomy Breakdown, and our Cross-Modal Consistency Engine which highlights the specific "Smoking Gun" mismatches between spoken claims and visual reality.
How we built it
We engineered Mendacia by forging a high-speed pipeline between video intelligence and linguistic reasoning. Using the Twelve Labs API, we performed deep temporal indexing to extract scene-level metadata, identifying specific actions, objects, and visual context that usually stay hidden from standard text-scrapers. This "visual ground truth" was then fed into Gemini, which acted as our forensic auditor. By comparing the Twelve Labs visual summary against the spoken transcript, we built a custom logic engine that flags "Cross-Modal Dissonance", essentially catching the moment a video tries to tell a lie that the footage can’t support.
Challenges we ran into
The biggest hurdle was the "Cross-Modal" logic. It is easy to detect a keyword like "riot," but it is much harder to prove that the visuals don't support it. We had to fine-tune our prompts to ensure the AI could distinguish between routine urban activity (detected by TwelveLabs) and the "Anarchy" claimed by a narrator. We also navigated the 24-hour hackathon limit by focusing on an MVP scope that prioritizes Explainable AI over simple "True/False" labels.
Accomplishments that we're proud of
We are most proud of the fact that we didn’t blink under the crushing pressure of the deadline. Even though the team was split into two separate pairs, one handling the complex backend logic and the other crafting the frontend experience, we succeeded in implementing all parts in parallel, merging them into a seamless whole at the final hour. Despite the looming API limits and the technical steepness of the project, we gave it our absolute all to ensure the Trust Gauge provided a truly accurate forensic report. Seeing the "Smoking Gun" feature finally highlight a specific mismatch between a claim and a visual was a massive win for the team. We didn't just build a tool; we built a shield for the digital age, proving that even in a 20-hour sprint, you can create something that stands up for the truth and empowers users to see through the "Siren's Call."
What we learned
By combining Twelve Labs’ visual data with Gemini’s reasoning, we realized that detecting lies isn't just about pixels, it's about checking if the visuals actually support what the narrator is saying. We had to move away from asking "Is this fake?" and instead focus on finding the specific mismatches between the two. Technically, we learned how to balance speed and depth during a 20-hour sprint. We figured out how to break videos into smaller "Forensic Windows" so Twelve Labs could index them quickly while Gemini analyzed the claims. This taught us that giving users a clear explanation of how they are being manipulated is much more powerful than just giving them a simple "Yes/No" score.
What's next for Untitled
Batch Processing: Scaling the engine to analyze entire social media feeds.
Browser Extension: A real-time "Siren Warning" that flags and handles manipulative tactics as you watch YouTube, X (Twitter) or upload any URLs.
Source Cross-Referencing: Integrating a third API layer to check if the footage used has been recycled from older, unrelated historical events.
Log in or sign up for Devpost to join the conversation.