Inspiration patterns, predict future crime rates, and develop a crime risk index to identify high-risk districts across India.

Understanding crime is crucial not just for policymakers and law enforcement but also for citizens who deserve transparency and safety in their communities.

What I Learned

How to clean and preprocess real-world government datasets. Applied various exploratory data analysis (EDA) techniques to uncover trends. Learned to implement regression models (linear, multivariate) to predict future crime trends. Used clustering algorithms (K-Means) to group districts with similar crime patterns. Built a crime risk index using normalized crime rate data. Explored ways to simulate seasonality and city-wise trends with limited temporal granularity. Visualized results using matplotlib, seaborn, and geospatial maps.

How I Built It

Data Cleaning:

Merged Telangana into Andhra Pradesh for consistency (data only available until 2014). Standardized state and district names to avoid mismatches. Removed unwanted rows like "TOTAL", "RAILWAYS", etc.

Analysis & Modeling: EDA Questions:

Total crimes, average murders per district Crime distribution by state & district Correlation matrix among crime types Most common crime type per district Visualization Questions: Heatmaps, bar charts, geospatial crime density Interactive filter dashboard in Colab (via dropdowns)

Advanced Questions:

Clustering using K-Means & PCA Time-series forecasting of crime trends (2015–2020) Crime Risk Index per district Classification model to label high vs. low-crime districts Identification of the state with highest dowry deaths Analysis of crimes against women Simulation of seasonal crime trends using yearly totals

Challenges Faced

Data Granularity: No month-wise data made it hard to analyze real seasonality. Missing Population Info: Had to manually map population from 2001 & 2011 Census. Spelling Variations & Duplicates: Many districts were spelled differently across rows. Geo-Mapping: Mismatched names between shapefiles and dataset required manual matching. Telangana Split: Adjusting for the new state while preserving meaningful insights.

What's next for Forecasting crime trends across India Incorporate recent crime records (2015–2024) to improve trend analysis and prediction accuracy. Move from state/district-level to city/ward-level analysis using local datasets and smart city feeds. LSTM / RNNs for time-series forecasting Random Forests & Gradient Boosting for classification Spatial Regression to include geographic dependencies

Built With

Share this project:

Updates