Project Report: Urban vehicle break-ins are a persistent and growing concern in dense cities, particularly in high-traffic and tourist-heavy areas. Drivers often lack reliable, data-driven guidance when choosing where and when to park, leading to avoidable losses and increased anxiety. To address this problem, we developed ParkWise, a predictive machine learning system designed to estimate the risk of vehicle break-ins at specific locations and times, enabling safer parking decisions.

ParkWise is built using historical incident data from the San Francisco Police Department, accessed through the Socrata REST API. The dataset spans multiple years (2018–2024) and includes detailed records of reported vehicle burglaries, with associated timestamps and geographic coordinates. Raw data was cleaned to remove duplicates and incomplete entries, and all temporal and spatial fields were standardized to ensure consistency. After preprocessing, the final dataset consisted of a substantial corpus of labeled samples validated against a strict 2024 hold-out test set to prevent data leakage.

Feature engineering played a critical role in model performance. Temporal features included cyclical encodings for hour of day and day of week, along with weekend indicators and seasonal patterns. Spatial features incorporated latitude, longitude, proximity to major tourist landmarks, and crime density measures. To better capture geographic structure, incidents were aggregated using the H3 hexagonal spatial indexing system, allowing the model to learn localized hotspot behavior at a 175-meter resolution. Because vehicle break-ins are relatively rare compared to safe parking events, negative sampling was applied to balance the dataset by introducing spatially and temporally shifted non-incident examples.

For prediction, we trained an XGBoost gradient-boosted decision tree model with 200 trees and a maximum depth of six. Early stopping was used during validation to prevent overfitting and improve generalization to unseen data. The model was trained on data from 2018–2023 to capture long-term trends while remaining responsive to short-term fluctuations.

The final model achieved strong performance across multiple evaluation metrics. It reached an AUC-ROC of 83.2 percent and a standout Precision-Recall AUC (PR-AUC) of 91.7 percent. Most notably, the model achieved a Precision score of 95.1 percent, indicating that high-risk predictions were highly accurate and rarely false positives. Calibration analysis confirmed that predicted risk scores closely matched observed outcomes. Feature importance analysis revealed that recent incident trends and time-of-day variables were the most influential predictors, reinforcing the presence of predictable crime patterns rather than random behavior.

ParkWise is deployed through a simple web-based interface that allows users to drop a pin on a map, select a time, and instantly receive a risk score. Beyond individual use, the system has broader implications for urban safety, including supporting proactive resource allocation for law enforcement. While the model is limited by reliance on reported crime data, ParkWise demonstrates how machine learning can transform historical public safety data into actionable, real-world decision support for safer cities.

Share this project:

Updates