๐ต๏ธโโ๏ธ SHERLOCK INVESTIGATOR โ Project Story
Inspiration
Modern multimodal AI is incredibly powerful โ but most applications still treat vision as passive.
You upload an image โ the AI describes it.
Thatโs where we saw the gap.
In real investigative, restoration, archival, and forensic workflows, experts donโt just look โ they:
- Zoom into details
- Enhance degraded visuals
- Cross-reference archives
- Verify identifiers
- Build evidence chains
We asked a simple question:
What if AI could investigate visual evidence the way a human detective does?
This idea became Sherlock Investigator โ an agentic forensic analyst that transforms passive perception into active investigation.
What it does
Sherlock Investigator is a multimodal AI system that analyzes images and video frames like a digital detective.
Instead of answering โWhat is this?โ, it answers:
โWhat is this, how do we know, and what evidence supports it?โ
Core capabilities include:
- ๐น Video & image forensic analysis
- ๐ Region-of-interest detection (plates, decals, features)
- ๐ง Agentic reasoning with visible thought logs
- ๐ Search grounding for verification
- ๐ Confidence scoring across evidence factors
- ๐๏ธ Structured forensic verdict reports
Example workflow:
- User uploads archival footage.
- AI extracts key frames.
- It enhances details (contrast, OCR, zoom).
- Identifies vehicles, objects, or artifacts.
- Cross-references historical databases.
- Produces a grounded verdict with sources.
Mathematically, the confidence model aggregates evidence weights:
[ C_{final} = \sum_{i=1}^{n} w_i \cdot e_i ]
Where:
- ( e_i ) = Evidence factor score
- ( w_i ) = Reliability weight
- ( C_{final} ) = Final identification confidence
How we built it
We built Sherlock Investigator using the Gemini 3 family via Google AI Studio, leveraging its multimodal and reasoning capabilities.
AI Stack
- Gemini 3 Flash โ Fast visual scanning & agent loops
- Gemini 3 Pro โ Deep reasoning & long-context analysis
Key API features used
- Multimodal vision understanding
- Code execution for image enhancement
- Search grounding for verification
- Structured output reasoning traces
System Architecture
Frontend
- Investigation dashboard UI
- Evidence viewport with overlays
- Verdict cards & confidence panels
Backend orchestration
- Frame extraction pipeline
- Enhancement filters (contrast, zoom)
- OCR & feature detection
- Grounded search verification
Agent loop
- Detect features
- Enhance evidence
- Extract identifiers
- Search & verify
- Resolve discrepancies
- Produce verdict
Challenges we ran into
1. Low-quality archival footage
Many test videos were:
- Grainy
- Motion blurred
- Low resolution
Solution:
We implemented iterative enhancement:
- Contrast boosting
- Region zoom
- Multi-pass OCR
2. Hallucination risk in identification
Historical identification must be verifiable.
Solution:
We enforced grounding:
- Claims require source matches
- Plates & identifiers cross-checked
- Verdict confidence tied to evidence
3. Latency vs reasoning depth
Deep analysis slowed demos.
Solution:
Two-tier processing:
- Flash โ fast visual loops
- Pro โ deep archival reasoning
4. Making reasoning understandable
Raw model reasoning is unreadable to users.
Solution:
We designed a Thought Log UI that translates reasoning into:
- Planning steps
- Hypotheses
- Cross-references
- Conclusions
Accomplishments that we're proud of
- Built a true agentic vision investigator, not a chatbot
- Enabled visible reasoning transparency
- Implemented grounded verification pipelines
- Designed a forensic evidence UX system
- Achieved high-confidence identification from degraded media
Most importantly:
We demonstrated that multimodal AI can investigate, not just describe.
What we learned
This project taught us:
Technical
- Agent loops dramatically improve vision accuracy
- Grounding reduces hallucinations significantly
- Enhancement pipelines are critical for OCR
Product
- Users trust AI more when reasoning is visible
- Evidence presentation matters as much as accuracy
- Confidence scoring improves decision usability
Research insight
Passive vision is insufficient for expert workflows.
Agentic investigation is the future of multimodal AI.
What's next for SHERLOCK INVESTIGATOR
We see Sherlock evolving into a full forensic intelligence platform.
Planned expansions
๐ Deep Archive Mode
- Upload manuals, films, registries
- Long-context cross-verification
๐ Audio Forensics
- Engine sound diagnostics
- Mechanical anomaly detection
๐ฐ๏ธ Geospatial Investigation
- Location inference from footage
- Historical map grounding
๐งพ Chain-of-custody reporting
- Court-ready forensic documentation
๐ ๏ธ Restoration assistant
- Identify parts
- Locate replacements
- Verify authenticity
Vision
Sherlock Investigator represents a shift:
[ \text{Passive Vision} \rightarrow \text{Active Investigation} ]
We believe the next generation of AI systems wonโt just see the worldโฆ
Theyโll investigate it.
Log in or sign up for Devpost to join the conversation.