Inspiration

Our team was inspired by the potential to transform raw crime data into actionable insights for safer communities. The UTA Datathon 2025’s focus on real-world impact motivated us to analyze India’s crime patterns (2001–2014), aiming to uncover trends that could guide policing and policy. The challenge of extracting meaning from 10,678 records across 34 crime types, like MURDER and RAPE, fueled our curiosity to reveal hidden driver like urbanization and cultural factors behind India’s crime landscape, aspiring to contribute to justice and safety.

What it does

EDA Patrol_Deployables delivers a comprehensive analysis of India’s district-wise crime data, revealing:

Scale: 30,080,489 total IPC crimes, averaging 489.80 murders per district. Hotspots: Urban areas (e.g., Delhi UT) lead, while Lakshadweep is lowest. Gender: 18.43% of crimes target women, with Uttar Pradesh highest in dowry deaths. Patterns: Visualizes trends via 10 plots (e.g., top states, urban-rural gaps) and an interactive dashboard. Predictions: Uses clustering, classification, and ARIMA to forecast crimes (~2.2M/year) and identify high-risk districts. It empowers stakeholders with data-driven policing strategies.

How we built it

We used Python in Colab with the provided dataset (Districtwise_Crime_of_India_2001_to_2014 - Sheet1.csv):

Cleaning: Standardized names (e.g., ANDHRAPRADESH), removed ‘TOTAL’ rows (~479), ensured no missing values. EDA: Grouped by state/district, computed totals (30,080,489 crimes), correlations (murder-theft: ~0.52). Visualization: Created 10 plots using Seaborn/Matplotlib (e.g., top states, crime distribution) and a Plotly dashboard. Modeling: Applied K-Means (4 clusters), Random Forest (~0.90 precision), Linear Regression (Adilabad trends), and ARIMA (national forecast).

Challenges we ran into

Data Gaps: No population or geospatial data limited per-capita and spatial analyses; we used urban proxies (e.g., ‘CITY’ keywords). Urban Proxy: Misclassifications (e.g., RAILWAY as urban) required careful keyword tuning. Output Clarity: Notebook didn’t print all results (e.g., top states, exact dowry deaths), needing inference from trends. Time Crunch: Balancing EDA, modeling, and visualization in the given time-frame was intense, prioritizing key insights (e.g., women crimes).

Accomplishments that we're proud of

Scale: Analyzed 30,080,489 crimes across ~900 districts, delivering robust insights. Gender Focus: Quantified 18.43% women crimes, highlighting systemic issues like dowry deaths. Visuals: Produced 10 clear plots (e.g., urban-rural, clusters) and a user-friendly dashboard. Modeling: Built K-Means, Random Forest, and ARIMA models, accurately identifying high-risk areas Teamwork: Collaborated seamlessly under pressure, blending coding and storytelling.

What we learned

Data Prep: Cleaning (e.g., regex for names) is critical for reliable aggregations. EDA Power: Simple metrics (e.g., 489.80 murders/district) reveal big trends. Urban Dynamics: Density drives crime; rural lows may reflect underreporting. Gender Insights: Women’s crime rates (18.43%) signal cultural challenges, needing policy focus. Modeling: Clustering and forecasting enhance predictive policing, but need richer data. Time Management: Prioritizing key analyses (e.g., top states, women crimes) in the given time.

What's next for EDA Patrol_Deployables

Data Enrichment: Integrate population, geospatial, and socio-economic data for per-capita and spatial insights. Monthly Trends: Add MONTH column to explore seasonal patterns (e.g., festival spikes). Advanced Models: Test deep learning for crime prediction, improving ARIMA’s linear limits.

Built With

Share this project:

Updates