Inspiration
Forensic investigations are slow, manual, and prone to human bias. When biological evidence is collected from a crime scene, detectives often have to sift through thousands of records by hand. We asked — what if machine learning could do that instantly, transparently, and at scale? That question inspired the Forensic Biological Evidence Analyzer.
What it does
Our system takes biological evidence collected from a crime scene — blood type, hair color, eye color, fingerprint class, and 6 DNA STR markers — and runs it against a database of 30,000 suspects using a trained Gradient Boosting ML model. It outputs a ranked list of suspects with confidence scores and written reasoning for every match. The system models uncertainty rather than claiming one correct answer — exactly how real forensic science works.
How I built it
- Python — core language
- Pandas & NumPy — data generation and feature engineering
- Scikit-learn — Gradient Boosting Classifier (96.2% accuracy)
- Streamlit — 5-page interactive web application
- Seaborn & Matplotlib — visualizations and forensic insights
- Synthetic Dataset — 30,000 suspect records with 19 biological features
The confidence score for each suspect is calculated as:
- 50% → ML model probability
- 30% → DNA STR allele overlap
- 20% → Physical trait matching
Challenges faced
- Scoring all 30,000 suspects in real time without crashing the app required bulk vectorized operations using NumPy instead of row-by-row loops
- Designing a synthetic dataset that was realistic enough to produce meaningful ML signal across 19 features
- Balancing the three scoring components (ML + DNA + physical) to produce confidence scores that felt forensically meaningful
What I learned
- How forensic DNA STR profiling actually works in real investigations
- How to build and deploy a full ML pipeline end to end in under 24 hours
- The importance of explainability in AI — showing WHY a suspect ranked high is just as important as the ranking itself
What's next
- Integrate real forensic databases and anonymized case data
- Add image recognition for facial composite matching
- Expand DNA markers from 6 to the standard 20 used in CODIS
- Add a map view showing last known suspect locations
Built With
- matplotlib
- pandas
- python
- scikit-learn
- seaborn
- streamlit
Log in or sign up for Devpost to join the conversation.