🐻 Sherlock Combs 🍯

✨ Inspiration

Massive investigative document dumps — court filings, legal records, leaked archives — are incredibly hard to explore manually. We wanted to build something that helps journalists, researchers, and curious investigators quickly answer questions like:

Who keeps showing up across documents?
When did certain connections first appear?
What evidence actually supports these relationships?

Inspired by powerful link-analysis tools (think Palantir-style investigation boards), we created a lightweight, evidence-first version that’s accessible, transparent, and built for discovery.

🚀 What It Does

Sherlock Combs transforms messy, unstructured document collections into an interactive, timeline-driven detective board.

Upload a set of PDFs, and it automatically:

Extracts people, organizations, locations, and dates
Builds an evolving relationship graph backed by real evidence
Lets you scrub through time to watch connections form and change
Shows the exact source snippet behind every single link

No black-box accusations — just grounded, traceable document intelligence.

🛠️ How We Built It

We built a full end-to-end pipeline in 24 hours:

Backend (Python)

pdfplumber for pulling clean text from PDFs
spaCy for entity recognition (PERSON, ORG, GPE, DATE)
OpenAI gpt-4o-mini for summarization + robust date extraction
Graph construction based on evidence-backed co-mentions

Frontend (Next.js + TypeScript)

React Flow for an interactive investigation graph
Custom timeline slider for time-based filtering
Evidence side panel for citation tracing
A gorgeous detective-board aesthetic
Bonus: 3D globe visualization for geographic context

Pipeline flow:
Upload → Parse → Extract Entities → Build Graph → Explore Visually

⚡ Challenges We Ran Into

Building Sherlock Combs in 24 hours came with a ton of fun (and very real) challenges, especially when dealing with messy investigative documents:

Distinguishing between different types of PDFs — scanned images, court filings, structured reports, and even email-style exchanges
Designing the UI to truly feel like an interactive detective corkboard
Parsing raw text and accurately extracting + classifying dates from inconsistent PDF formats
Fine-tuning Named Entity Recognition (NER) so we only extract the entities we care about (people + locations)
Animating Sherlock Combs himself while he’s responding to user prompts for a more immersive experience
Making sure we only load the most relevant selected nodes into the chat context to keep interactions fast and focused

🏆 Accomplishments We’re Proud Of

In just 24 hours, we managed to build something both powerful and adorable:

A cute animated bear mascot (Sherlock Combs!) that reacts while answering prompts
Beautiful flight-path visualizations that bring timeline events to life
A clean, polished detective-board UI that feels immersive and fun to explore
A full end-to-end data processing pipeline that actually works:
- Scraping documents directly from the DOJ website
- Storing raw files and metadata in Snowflake
- Parsing PDFs into clean text
- Using GPT-4o to generate summaries and extract key timeline context
- Running Named Entity Recognition (NER) to identify people + locations
- Storing everything in a Neo4j graph database for interactive relationship exploration

📚 What We Learned

This project was an amazing crash course in:

Real-world NLP edge cases
Graph construction + interactive visualization
Timeline-based UX as a powerful investigation dimension
The importance of citations + evidence in AI tools
Full-stack integration across Python + Next.js
Prompting + cost-efficient LLM usage
Performance optimization for dynamic graph interfaces
Ethical responsibility when building tools involving real people

🌟 What’s Next for Sherlock Combs

We’re just getting started. Future upgrades include:

Smarter alias merging + entity resolution
Semantic relationship extraction beyond co-mentions
Full-text + semantic search across document evidence
OCR support for scanned archives
Collaboration workspaces for investigative teams
External context integration (news + public databases)

Sherlock Combs turns unstructured documents into structured investigations — with every connection backed by proof.

Built With

d3.js
gpt-4o-mini
leaflet.js
neo4j
pdfplumber
reactflow

Submitted to

CXC 2026 - AI Hackathon

Created by

Mentored and on-boarded hackers to distributed system and database technology. Wore multiple hats from product management, managing agile sprints, backend engineering and database architecture.

Young Jun Joo
data pipeline design, frontend design and implementation

Ri Hong
Alfred Zimmer
Lin Hong

Updates

Alfred Zimmer started this project — Feb 07, 2026 08:52 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.