Veragate

Inspiration

In industries like construction, insurance, and manufacturing, there is a critical need to verify "ground truth" against technical requirements. We identified a gap between what is written in documentation (blueprints, specs, contracts) and what is captured in visual evidence (site videos).

Standard tools often fail to bridge these modalities effectively. We were inspired to build VeraGate to act as an objective, AI-powered forensic auditor that can "see" the site and "read" the specs simultaneously, ensuring that truth is verified without the "blindness" caused by disconnected data sources.

What it does

VeraGate is a multimodal forensic audit engine that detects contradictions between video evidence and technical documentation in real-time.

It operates through a streamlined workflow:

Ingestion: Users upload a video file (evidence) and a PDF document (technical specs).
Video Analysis: The "Watcher" agent performs OCR, transcription, and spatial analysis on the video footage.
Forensic Audit: The "Auditor" agent cross-references the visual data against the full text of the PDF to detect discrepancies.
Contradiction Alerts: It identifies specific issues such as Spatial (wrong position), Temporal (time mismatch), Factual (conflicting info), and Specification (technical violation) errors.
Thinking Log: It displays the AI's real-time reasoning process, showing exactly how it reached its conclusions.

How we built it

We utilized a Two-Agent Architecture orchestrated within a Next.js 16 application.

Agent 1: The Watcher (gemini-2.0-flash): We selected the Flash model for its speed and multimodal capabilities. It handles the heavy lifting of video processing, extracting text and spatial data from frames efficiently.
Agent 2: The Auditor (gemini-2.0-pro-exp-02-05): We used the Pro model with Thinking Mode (thinkingLevel: HIGH) for the analysis. This agent ingests the entire PDF (up to 1M tokens) and the video analysis to perform deep deductive reasoning.
No-RAG Approach: Instead of using Retrieval-Augmented Generation (RAG), which breaks documents into chunks, we fed the full document context to the model to preserve the global context required for forensic accuracy.
Tech Stack: The frontend is built with React, Tailwind CSS, and Framer Motion, communicating with the backend via Server-Sent Events (SSE) to stream the AI's "thinking" tokens to the UI.

Challenges we ran into

Context Fragmentation: We initially struggled with how to feed large technical documents to the AI without losing nuance. We solved this by making the key technical decision to abandon RAG and rely on Gemini 3's massive context window for full document ingestion.
Transparency: In forensics, a simple "yes/no" isn't enough. We needed to show why a contradiction was flagged. We overcame this by implementing a Thinking Log that streams the model's internal decision process to the user in real-time.
Browser Hydration: We encountered hydration errors caused by browser extensions modifying the DOM, which required careful debugging and troubleshooting in the Next.js environment.

Accomplishments that we're proud of

Multimodal Reasoning: Successfully combining video OCR/transcription with deep textual analysis to find complex contradictions (e.g., shadows indicating the wrong time of day).
The Thinking Log: Implementing a visible "brain" for the application where users can watch the gemini-2.0-pro model reason through evidence step-by-step.
Seamless Large File Handling: Integrating the Google Files API to handle large video uploads (up to 2GB) and PDF ingestion (up to 1M tokens) smoothly within the web interface.

What we learned

The Power of Context: We learned that for audit tasks, providing the full document context is far superior to RAG, as it allows the model to understand the document as a cohesive whole.
Specialized Agents: We discovered that separating concerns—using a fast model ("Flash") for perception and a deep model ("Pro") for reasoning—resulted in a more efficient and accurate system.
Structured Output: We learned the importance of enforcing responseMimeType: "application/json" to ensure that the complex "thinking" process ultimately resolves into structured, actionable data for the UI.

What's next for Veragate

Model Refinement: We aim to continue refining the gemini-2.0-pro-exp integration as the model moves from experimental to stable, potentially increasing the complexity of audits it can handle.
Enhanced Forensic Types: We plan to expand the system to detect even more subtle contradiction types beyond the current Spatial, Temporal, Factual, and Specification categories.
Real-World Deployment: Moving from the current prototype status to a production-ready tool that can accept live video feeds for on-site auditing.

Built With

framer
gemini
genai-sdk
google-files
lucide
multi-agent
nextjs
radix
react
tailwind

Updates

Jasson Franklyn Wang started this project — Feb 09, 2026 01:42 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.