Homepage
Agent Process

🕵️‍♂️ SHERLOCK INVESTIGATOR — Project Story

Inspiration

Modern multimodal AI is incredibly powerful — but most applications still treat vision as passive.

You upload an image → the AI describes it.

That’s where we saw the gap.

In real investigative, restoration, archival, and forensic workflows, experts don’t just look — they:

Zoom into details
Enhance degraded visuals
Cross-reference archives
Verify identifiers
Build evidence chains

We asked a simple question:

What if AI could investigate visual evidence the way a human detective does?

This idea became Sherlock Investigator — an agentic forensic analyst that transforms passive perception into active investigation.

What it does

Sherlock Investigator is a multimodal AI system that analyzes images and video frames like a digital detective.

Instead of answering “What is this?”, it answers:

“What is this, how do we know, and what evidence supports it?”

Core capabilities include:

📹 Video & image forensic analysis
🔍 Region-of-interest detection (plates, decals, features)
🧠 Agentic reasoning with visible thought logs
🌐 Search grounding for verification
📊 Confidence scoring across evidence factors
🗂️ Structured forensic verdict reports

Example workflow:

User uploads archival footage.
AI extracts key frames.
It enhances details (contrast, OCR, zoom).
Identifies vehicles, objects, or artifacts.
Cross-references historical databases.
Produces a grounded verdict with sources.

Mathematically, the confidence model aggregates evidence weights:

[ C_{final} = \sum_{i=1}^{n} w_i \cdot e_i ]

Where:

( e_i ) = Evidence factor score
( w_i ) = Reliability weight
( C_{final} ) = Final identification confidence

How we built it

We built Sherlock Investigator using the Gemini 3 family via Google AI Studio, leveraging its multimodal and reasoning capabilities.

AI Stack

Gemini 3 Flash → Fast visual scanning & agent loops
Gemini 3 Pro → Deep reasoning & long-context analysis

Key API features used

Multimodal vision understanding
Code execution for image enhancement
Search grounding for verification
Structured output reasoning traces

System Architecture

Frontend

Investigation dashboard UI
Evidence viewport with overlays
Verdict cards & confidence panels

Backend orchestration

Frame extraction pipeline
Enhancement filters (contrast, zoom)
OCR & feature detection
Grounded search verification

Agent loop

Detect features
Enhance evidence
Extract identifiers
Search & verify
Resolve discrepancies
Produce verdict

Challenges we ran into

1. Low-quality archival footage

Many test videos were:

Grainy
Motion blurred
Low resolution

Solution:

We implemented iterative enhancement:

Contrast boosting
Region zoom
Multi-pass OCR

2. Hallucination risk in identification

Historical identification must be verifiable.

Solution:

We enforced grounding:

Claims require source matches
Plates & identifiers cross-checked
Verdict confidence tied to evidence

3. Latency vs reasoning depth

Deep analysis slowed demos.

Solution:

Two-tier processing:

Flash → fast visual loops
Pro → deep archival reasoning

4. Making reasoning understandable

Raw model reasoning is unreadable to users.

Solution:

We designed a Thought Log UI that translates reasoning into:

Planning steps
Hypotheses
Cross-references
Conclusions

Accomplishments that we're proud of

Built a true agentic vision investigator, not a chatbot
Enabled visible reasoning transparency
Implemented grounded verification pipelines
Designed a forensic evidence UX system
Achieved high-confidence identification from degraded media

Most importantly:

We demonstrated that multimodal AI can investigate, not just describe.

What we learned

This project taught us:

Technical

Agent loops dramatically improve vision accuracy
Grounding reduces hallucinations significantly
Enhancement pipelines are critical for OCR

Product

Users trust AI more when reasoning is visible
Evidence presentation matters as much as accuracy
Confidence scoring improves decision usability

Research insight

Passive vision is insufficient for expert workflows.

Agentic investigation is the future of multimodal AI.

What's next for SHERLOCK INVESTIGATOR

We see Sherlock evolving into a full forensic intelligence platform.

Planned expansions

📚 Deep Archive Mode

Upload manuals, films, registries
Long-context cross-verification

🔊 Audio Forensics

Engine sound diagnostics
Mechanical anomaly detection

🛰️ Geospatial Investigation

Location inference from footage
Historical map grounding

🧾 Chain-of-custody reporting

Court-ready forensic documentation

🛠️ Restoration assistant

Identify parts
Locate replacements
Verify authenticity

Vision

Sherlock Investigator represents a shift:

[ \text{Passive Vision} \rightarrow \text{Active Investigation} ]

We believe the next generation of AI systems won’t just see the world…

They’ll investigate it.

Built With

Updates

Darshan K started this project — Feb 07, 2026 02:25 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.