Inspiration

LA crime data shapes real policy decisions. We wanted to know: can we actually trust it?

What it does

Flags unreliable crime timestamps using statistics and ML, then reveals what the cleaned data says about vulnerable populations in LA.

How we built it

Python, pandas, matplotlib, scikit-learn, contextily. IQR, Modified Z-score, and Local Outlier Factor for anomaly detection. NLP pipeline for crime categorization.

Challenges we ran into

LOF doesn't scale to 1M rows. 140 crime codes needed automated consolidation. Distinguishing placeholder timestamps from genuinely delayed reports.

Accomplishments that we're proud of

Crime patterns stayed stable after cleaning proving the dataset is robust. The least reliable records belong to the most vulnerable victims.

What we learned

Bad data and social vulnerability are the same problem. Cleaning data reveals who gets seen and who doesn't.

What's next for DatathonPD LAPD 3-tier timestamp confidence system. Expand to other cities. Partner with advocacy organizations.

Built With

Share this project:

Updates