Inspiration
The Montreal Crime Data has info about reported crimes in Montreal, reported to Service de police de la Ville de Montréal (SPVM). This is a large dataset, but it is sparse. It has spatial and time aspect, but it is not in a common time-series form. So we are interested in exploring the space-time relation in a multivariate environment.
What it does
We made exploratory analysis with maps and facet plots to explore the general trend and differences between subsets of the data.
We wonder if there is a significant association or relationship between the crime category and time of the day the crime is reported. So we Construct a contingency table and perform chi-squared test of independence. It turns out that we have a chi-squared of above 8000 with 10 degrees of freedom. And we have an extremely small p-value. Therefore we reject the null hypothesis and conclude that there is a statistically significant association between crime category and crime time.
Then, for the correlated variable, we explored two questions.
The first one try to use location data to predict the type of crime. This is a spatial model if viewed statically, but since the location-crime type relation may shift over time, we are interested in building multiple models on different smaller sets that samples 7 consecutive days, with 100 days apart.
The second tries to predict the frequency for a particular location of the crime based on past frequency location. We wrote a conversion function that aggregates counts within regions for a given time window. Similar to the previous one, we repeatedly modeled on several temporal sets and compared the model performances, but using a longer time sequence (40 instead of 7) this time. First the VAR model, since the AIC value can get low when we increase the order, we suggest it is possible to build a more complicated model. Then we used adjusted Dickey-Fuller test. Most p values are small, so we moved on to try a transformer model for multivariable time series, using the last 10 observations in each small time series data, and report the results on a validation set.
How to Improve
- Write sampling conversion code with less time complexity.
- Since the guiding questions ask about "Why," we can use more inference/causality inference models to compare
- Does it answer the question on the pattern? We need to know whether the strategy of the police influences the observation (more police, more caught)

Log in or sign up for Devpost to join the conversation.