Inspiration
We are living in an era where seeing is no longer believing. Generative AI has broken the boundary of truth. We see this in CEO fraud during video calls, fake evidence in courtrooms, and heart-wrenching scams targeting the elderly who are tricked by AI-generated videos of their grandchildren.
Existing detection tools are failing. They rely on invisible watermarks that hackers can strip in seconds, or black-box classifiers that give a fake score with no explanation.
We wanted to build something different. We asked: "What if an AI could investigate a deepfake like a human forensic expert?" Not just looking for pixels, but looking for logic, physics, and biometric flaws.
Enter Project Niwa.
What it does
Niwa is an agentic forensic scanner that acts as a digital truth layer. It uses a tiered architecture to balance speed and intelligence:
- The Live Sentinel (Biometric Triage): A real-time scanner for video streams. It uses Gemini 2.5 Flash to detect temporal artifacts—the subtle micro-jitters, texture washing, and lip-sync failures that occur in raw pixels. It works instantly to protect users from live video threats.
- The Deep Forensic Auditor (Gemini 3 Reasoning): When a threat is flagged, the system stores the frames and escalates it to Gemini 3 Preview. We leverage Gemini 3's advanced multimodality and reasoning capabilities to orchestrate a suite of Python forensic tools (ELA, EXIF, Noise Analysis). Unlike older models, Gemini 3 spots logical impossibilities—like a receipt timestamp that contradicts the lighting—that pixel-based detectors miss.
How we built it

We built Niwa on a FastAPI backend coupled with a Next.js frontend, communicating via WebSockets for real-time feedback.
The Tiered Intelligence Architecture
The core innovation is routing. We utilize a WebSocket router that sends frames to Gemini Flash. If Flash detects a high confidence anomaly, it triggers a request for a deep audit. The Deep Audit agent uses Function Calling to execute local Python scripts and synthesizes a final verdict.
The Silent JSON Protocol
To achieve low-latency performance, we optimized the model prompts to act strictly as a backend API. By enforcing strict JSON output and negative constraints, we forced the model to output raw data bytes immediately, reducing the "time-to-first-token" significantly.
Challenges we ran into
- Gemini 3 Latency: A major technical hurdle was attempting to implement the live streaming layer using Gemini 3. We initially wanted its reasoning power for live analysis, but the latency was too high for seamless real-time processing. This forced us to pivot to Gemini 2.5 Flash for the live sentinel layer, reserving Gemini 3 for the deep, asynchronous audit where its reasoning capabilities shine.
- The Verbose Model: Early versions of the Live Agent would include conversational filler text ("I have analyzed the frame..."), which added lag. We solved this by enforcing strict JSON constraints.
Accomplishments that we are proud of
- True Agentic Forensics: We didn't just build a classifier; we built an investigator. Seeing Gemini 3 autonomously decide to crop an image and run Error Level Analysis was a "magic moment."
- The Freeze-Frame UI: We designed a "Terminator-style" HUD that draws bounding boxes around specific anomalies (e.g., "Gliding Feet", "Impossible Timestamp"). It makes the AI's decision-making explainable and transparent.
What is next for Project Niwa
- Browser Extension: Bringing the "Live Sentinel" directly to Chrome to scan social media feeds in real-time.
- Audio Forensics: Integrating audio spectral analysis to detect AI-generated voices in calls.
Built With
- agentic-workflow
- fastapi
- google-gemini
- next.js
- opencv
- pillow
- python
- react
- typescript
- websockets
Log in or sign up for Devpost to join the conversation.