Inspiration
Deepfakes are becoming increasingly indistinguishable from reality. Synthetic media is already being weaponized globally for political manipulation, financial fraud, and targeted harassment.
Most deepfake detection systems rely on a single AI model to make a binary decision. I wanted to explore a different approach — a system that behaves more like a digital forensic investigation team, where each specialist analyzes media from a different perspective before reaching a conclusion.
This led me to build VASTAV Agent, a multi-agent artificial intelligence system created specifically for the Gemini Live Agent Challenge.
VASTAV Agent is completely separate from my other creation, VASTAV AI, which is my production deepfake detection platform that uses its own proprietary detection models.
In contrast, VASTAV Agent is an experimental multi-agent architecture built using Google's Agent Development Kit and Gemini models.
What it does
VASTAV Agent uses six independent AI agents powered by Google Gemini and orchestrated using the Google Agent Development Kit (ADK).
Each agent analyzes media from a different forensic perspective:
Agent 1 — Forensic & Biometric Specialist
Analyzes lighting, facial structure, and biometric inconsistencies.Agent 2 — AI Artifacts & Neural Pattern Expert
Detects diffusion artifacts, GAN fingerprints, and synthetic textures.Agent 3 — Contextual & Semantic Evaluator
Evaluates scene logic, object relationships, and contextual anomalies.Agent 4 — Physics, Lighting & Materials Specialist
Analyzes shadows, reflections, material behavior, and physical realism.Agent 5 — Chief Justice (Holistic Analysis)
Aggregates the reasoning of all agents to evaluate the overall authenticity of the media.Agent 6 — SynthID & AI Origin Specialist
Searches for AI watermark signals and indicators of synthetic origin.
The system requires at least 4 out of 6 agents to agree before producing a final authenticity verdict.
In addition to the verdict, VASTAV Agent generates a detailed forensic PDF report containing confidence scores and reasoning from every agent.
VASTAV Agent also features voice verdict announcement — after all 6 judges reach consensus, the system speaks the final verdict out loud: announcing whether the media is REAL or FAKE, the confidence score, and how many judges agreed. This transforms the experience from a text-based tool into a truly multimodal AI agent that sees, analyzes, and speaks.
Architecture Overview
The system follows a multi-agent ensemble architecture where independent AI agents analyze the same media input and then converge through a consensus mechanism.
Pipeline
User Upload → React Frontend → Node.js Backend → Media Processing Layer (EXIF + FFmpeg) → Parallel ADK Agents → Consensus Engine → Final Verdict → PDF Forensic Report
Each agent runs independently and evaluates different forensic signals before contributing its reasoning to the consensus engine.
How I built it
- Agent Framework: Google Agent Development Kit (ADK) using the ParallelAgent orchestration pattern
- AI Engine: Google Gemini 2.0 Flash for multimodal reasoning and analysis
- Backend: Node.js, Express.js, and TypeScript
- Frontend: React with Tailwind CSS, Framer Motion, and shadcn/ui
- Hosting: Google Cloud Run
- Media Processing: FFmpeg for video frame extraction and analysis
- Metadata Analysis: EXIF metadata inspection
- Reporting: PDFKit for generating forensic intelligence reports
- Database Layer: Drizzle ORM
All six agents run in parallel, analyzing the same media input independently.
Their outputs are then aggregated through a consensus engine, which determines the final authenticity verdict.
Challenges
Keeping six agents truly independent without overlapping reasoning domains was one of the biggest challenges.
Another challenge was maintaining consistent structured JSON responses from Gemini across six parallel agent calls while keeping latency low.
For video analysis, I designed an optimized FFmpeg frame sampling pipeline so the system could analyze representative frames rather than every frame, significantly improving performance.
What I learned
One important lesson from this project is that no single AI model should make a critical authenticity decision alone.
A multi-agent consensus architecture dramatically reduces false positives because multiple independent analyses must converge before a verdict is issued.
Working with the ADK ParallelAgent pattern also demonstrated how powerful agent-based AI systems can be for complex reasoning and verification tasks.
Accomplishments that I'm proud of
- Built a 6-agent forensic AI ensemble
- Implemented a consensus-based deepfake detection system
- Created automated forensic PDF intelligence reports
- Implemented parallel AI agent orchestration
- Built video frame extraction and analysis using FFmpeg
Built With
- drizzle-orm
- express.js
- ffmpeg
- framer-motion
- google-adk
- google-cloud-run
- google-gemini
- google-genai-sdk
- node.js
- pdfkit
- react
- shadcn-ui
- tailwindcss
- typescript
- vite
Log in or sign up for Devpost to join the conversation.