Inspiration

Forensic investigations are slow, manual, and prone to human bias. When biological evidence is collected from a crime scene, detectives often have to sift through thousands of records by hand. We asked — what if machine learning could do that instantly, transparently, and at scale? That question inspired the Forensic Biological Evidence Analyzer.

What it does

Our system takes biological evidence collected from a crime scene — blood type, hair color, eye color, fingerprint class, and 6 DNA STR markers — and runs it against a database of 30,000 suspects using a trained Gradient Boosting ML model. It outputs a ranked list of suspects with confidence scores and written reasoning for every match. The system models uncertainty rather than claiming one correct answer — exactly how real forensic science works.

How I built it

  • Python — core language
  • Pandas & NumPy — data generation and feature engineering
  • Scikit-learn — Gradient Boosting Classifier (96.2% accuracy)
  • Streamlit — 5-page interactive web application
  • Seaborn & Matplotlib — visualizations and forensic insights
  • Synthetic Dataset — 30,000 suspect records with 19 biological features

The confidence score for each suspect is calculated as:

  • 50% → ML model probability
  • 30% → DNA STR allele overlap
  • 20% → Physical trait matching

Challenges faced

  • Scoring all 30,000 suspects in real time without crashing the app required bulk vectorized operations using NumPy instead of row-by-row loops
  • Designing a synthetic dataset that was realistic enough to produce meaningful ML signal across 19 features
  • Balancing the three scoring components (ML + DNA + physical) to produce confidence scores that felt forensically meaningful

What I learned

  • How forensic DNA STR profiling actually works in real investigations
  • How to build and deploy a full ML pipeline end to end in under 24 hours
  • The importance of explainability in AI — showing WHY a suspect ranked high is just as important as the ranking itself

What's next

  • Integrate real forensic databases and anonymized case data
  • Add image recognition for facial composite matching
  • Expand DNA markers from 6 to the standard 20 used in CODIS
  • Add a map view showing last known suspect locations

Built With

Share this project:

Updates