🐻 Sherlock Combs 🍯

✨ Inspiration

Massive investigative document dumps — court filings, legal records, leaked archives — are incredibly hard to explore manually. We wanted to build something that helps journalists, researchers, and curious investigators quickly answer questions like:

  • Who keeps showing up across documents?
  • When did certain connections first appear?
  • What evidence actually supports these relationships?

Inspired by powerful link-analysis tools (think Palantir-style investigation boards), we created a lightweight, evidence-first version that’s accessible, transparent, and built for discovery.


🚀 What It Does

Sherlock Combs transforms messy, unstructured document collections into an interactive, timeline-driven detective board.

Upload a set of PDFs, and it automatically:

  • Extracts people, organizations, locations, and dates
  • Builds an evolving relationship graph backed by real evidence
  • Lets you scrub through time to watch connections form and change
  • Shows the exact source snippet behind every single link

No black-box accusations — just grounded, traceable document intelligence.


🛠️ How We Built It

We built a full end-to-end pipeline in 24 hours:

Backend (Python)

  • pdfplumber for pulling clean text from PDFs
  • spaCy for entity recognition (PERSON, ORG, GPE, DATE)
  • OpenAI gpt-4o-mini for summarization + robust date extraction
  • Graph construction based on evidence-backed co-mentions

Frontend (Next.js + TypeScript)

  • React Flow for an interactive investigation graph
  • Custom timeline slider for time-based filtering
  • Evidence side panel for citation tracing
  • A gorgeous detective-board aesthetic
  • Bonus: 3D globe visualization for geographic context

Pipeline flow:
Upload → Parse → Extract Entities → Build Graph → Explore Visually


⚡ Challenges We Ran Into

Building Sherlock Combs in 24 hours came with a ton of fun (and very real) challenges, especially when dealing with messy investigative documents:

  • Distinguishing between different types of PDFs — scanned images, court filings, structured reports, and even email-style exchanges
  • Designing the UI to truly feel like an interactive detective corkboard
  • Parsing raw text and accurately extracting + classifying dates from inconsistent PDF formats
  • Fine-tuning Named Entity Recognition (NER) so we only extract the entities we care about (people + locations)
  • Animating Sherlock Combs himself while he’s responding to user prompts for a more immersive experience
  • Making sure we only load the most relevant selected nodes into the chat context to keep interactions fast and focused

🏆 Accomplishments We’re Proud Of

In just 24 hours, we managed to build something both powerful and adorable:

  • A cute animated bear mascot (Sherlock Combs!) that reacts while answering prompts
  • Beautiful flight-path visualizations that bring timeline events to life
  • A clean, polished detective-board UI that feels immersive and fun to explore
  • A full end-to-end data processing pipeline that actually works:
    • Scraping documents directly from the DOJ website
    • Storing raw files and metadata in Snowflake
    • Parsing PDFs into clean text
    • Using GPT-4o to generate summaries and extract key timeline context
    • Running Named Entity Recognition (NER) to identify people + locations
    • Storing everything in a Neo4j graph database for interactive relationship exploration

📚 What We Learned

This project was an amazing crash course in:

  • Real-world NLP edge cases
  • Graph construction + interactive visualization
  • Timeline-based UX as a powerful investigation dimension
  • The importance of citations + evidence in AI tools
  • Full-stack integration across Python + Next.js
  • Prompting + cost-efficient LLM usage
  • Performance optimization for dynamic graph interfaces
  • Ethical responsibility when building tools involving real people

🌟 What’s Next for Sherlock Combs

We’re just getting started. Future upgrades include:

  • Smarter alias merging + entity resolution
  • Semantic relationship extraction beyond co-mentions
  • Full-text + semantic search across document evidence
  • OCR support for scanned archives
  • Collaboration workspaces for investigative teams
  • External context integration (news + public databases)

Sherlock Combs turns unstructured documents into structured investigations — with every connection backed by proof.

Built With

Share this project:

Updates