Inspiration
LA crime data shapes real policy decisions. We wanted to know: can we actually trust it?
What it does
Flags unreliable crime timestamps using statistics and ML, then reveals what the cleaned data says about vulnerable populations in LA.
How we built it
Python, pandas, matplotlib, scikit-learn, contextily. IQR, Modified Z-score, and Local Outlier Factor for anomaly detection. NLP pipeline for crime categorization.
Challenges we ran into
LOF doesn't scale to 1M rows. 140 crime codes needed automated consolidation. Distinguishing placeholder timestamps from genuinely delayed reports.
Accomplishments that we're proud of
Crime patterns stayed stable after cleaning proving the dataset is robust. The least reliable records belong to the most vulnerable victims.
What we learned
Bad data and social vulnerability are the same problem. Cleaning data reveals who gets seen and who doesn't.
What's next for DatathonPD LAPD 3-tier timestamp confidence system. Expand to other cities. Partner with advocacy organizations.
Built With
- matplotlib
- pandas
- python
- scikit-learn
Log in or sign up for Devpost to join the conversation.