Inspiration

New York City faces a growing sustainable infrastructure challenge as aging water and sewer systems, and increasing rainfall due to climate change place mounting stress on the ground beneath the city. These conditions increase the risk of sinkholes, which can disrupt transportation, damage property, and threaten public safety. More broadly, these failures reveal a larger sustainability issue: critical urban infrastructure is aging faster than it is being proactively maintained. When cities respond only after roads collapse or utilities fail, repairs become more expensive, and less sustainable over time. This project approaches sinkhole risk as an issue of infrastructure resilience and preventative planning across New York City. By using publicly available civic, environmental, and infrastructure-related data, we aim to identify where conditions suggest elevated risk before major failures occur. A predictive, data-driven approach can help shift city management from reactive emergency repair to earlier, smarter intervention. In the long term, that supports more sustainable urban systems by reducing maintenance costs, improving public safety, and helping the city allocate resources more efficiently.

What it does

Our model uses a Random Forest machine learning algorithm to predict where sinkholes are most likely to occur across New York City. The city is divided into a grid of 500m by 500m cells, and for each cell we calculate a set of features that capture environmental conditions and infrastructure stress. These features include Month Prior Month Complaints Complaints in the last 3 months Water main breaks last month Water main breaks in the last 3 months Pavement quality rating Elevation Depth Neighboring sinkhole incidents Neighboring water breaks The model learns patterns from historical data to estimate the probability that a sinkhole will occur in each location. Because several of these inputs are time-based, the model can also be updated automatically each month as new 311, water main break, and precipitation data become available. A Random Forest works by combining many decision trees, each of which makes a prediction based on different subsets of the data. By averaging across these trees, the model produces more accurate and stable predictions. The final output is a probability score for each grid cell, which is visualized on the interactive map. Higher values indicate areas where underlying conditions suggest greater risk, allowing planners to prioritize inspection and preventative maintenance.

Challenges we ran into

One key limitation that prevented us from achieving a higher prediction accuracy was noisy data concerning the 311 report. Many of the cave-in reports may be misclassified as sinkholes, since some could be complaints about potholes or other miscellaneous damage. Additionally, we were missing one essential piece of information in our analysis: the construction date of the sewers. This data is unfortunately not publicly available, so the closest we could get is the time of construction of nearby buildings. To improve our prediction accuracy, we would need more descriptive labels of the cave-ins and access sewer construction records.

Accomplishments that we're proud of

Built a city-wide sinkhole prediction model using real NYC infrastructure + environmental data Engineered spatiotemporal features (500m grid + time-based signals) Achieved 0.85 accuracy and strong AUC (~0.77) Created an interactive risk dashboard/map for decision-making Designed for monthly automated updates and real-world deployment Framed problem as proactive infrastructure sustainability, not just prediction

What we learned

We learned that combining public infrastructure, environmental, and civic data can reveal meaningful early warning signs of sinkhole risk and support more proactive, sustainable city maintenance.

What's next for Resilient NYC

As part of a Phase II implementation, this model is designed to support automated monthly updates as new 311, water main break, and precipitation data become available, and will display on this page.

Built With

Share this project:

Updates