Subway Ridership and Crime

Inspiration

We want to help contribute to a safer, more effective way for metro users to travel throughout NYC. With the model we have created, we hope to provide insights that can help MTA more effectively staff their stations and keep the civilians of New York City more well-informed about their safety when traveling.

What it does

Our model uses New York City transit data to predict the number of crimes per day that occur in each neighborhood of the city.

How we built it

We trained a CatBoost Regressor using default parameters to generate predictions for the number of crimes per day occurring in each neighborhood (distinct zip code) of New York City.

Challenges we ran into

The biggest challenge we ran into was processing and merging the data. Since we combined data from several different sources, we had to ensure that all of the keys we were joining on were consistent and that no necessary data was lost as merges were performed.

Accomplishments that we're proud of

We're proud of training a model that achieved an R^2 score of over 0.7 on the features we have engineered since predicting crime using transit data is a problem that hasn't been tackled frequently by prior studies.

What we learned

We've learned that there is likely some form of correlation between transit data and crime in the city, as roughly 70% of the variability in both our test and training sets can be explained by the features we have engineered from the HRT data set.

What's next for Subway Ridership and Crime

Some future steps that could further this project are to proceed with hyperparameter tuning, clustering neighborhoods based on geographical location, and crime forecasting based on data from previous years. By performing these steps, we may be able to improve the performance of our model.