Safety is one of the most basic need. Safe communities thrive, while unsafe one eventually dis-integrate.
Champaign Urbana police provides a lot of publicly available data, over 450 thousands incidents and over 200 thousands arrest records going back over 20 years.
This data is valuable, it could help improve awareness and safety of the community, yet the majority of the public is not aware of it.
We want to create visually appealing graphs from the data and make it extremely easy to understand. We believe that is the way to make this data useful to general public.
What it does
We created 2 types of dashboards from the data: CU Crime Watch, and CU Criminal Trend
The CU Crime Watch dashboard allows anyone to quickly understand the crime statistics for Champaign Urbana area. User can select a date range, e.g last 3 months, last year or last 20 years. The dashboard is broken down into 3 main sections: Overview, Location Analysis and Time Analysis.
Overview section let CU residence see the distribution of criminal incidents over different neighborhood in Champaign Urbana via a heat map. It displays top 10 crime category by incident count, the trend in number of incidents, arrest, and arrest rate for different crime category, etc…
Location Analysis help CU residence understand where crime usually takes place, the type of location (parking lot, street, apartment, school, public park, etc…), the streets that have high crime rate, and crime type that they should watch out for on the troublesome streets.
Time Analysis lets residence knows which hour of day, day of month, month of year etc that crime is more likely to happen, and how this change with time. Base on this data, residence can choose time that are safer for certain activities. E.g. know that criminal activity peak around noon and 8pm, people can choose to arrange outdoor activities at different hour.
These dashboards could also be useful to CU police, to effectively distribute their taskforce and resource to the locations and periods that need more attention.
We have 2 versions of CU Crime Watch, one with year interval for longer term analysis and weekly interval for more recent trend.
The CU Criminal Trend dashboard allows us to understand about the criminals in CU through arrest records. Via this dashboard, CU police and public can view and break down the demographic distribution of criminals, such as age, gender, race. We can also understand how other factors that may be correlated to or lead to crime, such as employment status. We also see followup actions after arrest and how that change over time, i.e whether the individual eventually being put into fail, get bailed, etc...
How I built it
First, we collected the publicly available data on CU incidents and arrests over last 20 years. We then imported the data into Hive and run analyses to understand it. We created over 100 distribution graphs, one for each field in the datasets. From there we know the quality of each data field, and implement appropriate data clean up, e.g we found a small portion of the data has incorrect “year” field with value 2, 101, etc… Similarly some rows had “age” field with values <0. We added a filter to clean them out.
We also created other necessary data: first is geo topojson file for Champaign Urbana neighborhood division. There were 20 neighborhood in Champaign, 11 for Urbana and 1 for Savoy. Second is a mapping file from crime code to crime level (1-5), with 1 being the most serious (such as burglary, break and entry) and 5 being less serious “crimes” (such as a parking ticket).
We then created Looker view, explore and model definitions for those data, create necessary joins and dashboards.
What's next for ICU
We want to try applying machine learning to detect anomaly and trend in criminal activities. We can also classify and detect related events, and from there discover hidden patterns from the data which might help the police to figure out and catch criminals more quickly.
Right now the dashboards are created locally on our machines. We also want to implement a flow to automatically download new data when it is released, update the dashboards and made them available to the public.