Inspiration
The ever-increasing challenge of Urban Heat Islands (UHI) in cities worldwide served as our primary inspiration. We recognized the urgent need for accessible tools that not only help visualize and understand the complex drivers of UHI but also offer actionable, data-driven strategies for mitigation. We envisioned a solution that could empower urban planners, policymakers, and even residents to make informed decisions towards creating cooler, more resilient, and more livable urban environments for all. The potential to leverage cutting-edge technologies like geospatial analysis, machine learning, and Generative AI to tackle this critical environmental issue was a powerful motivator.
What it does
GenAi UHI Mitigation is a decision-support tool designed to analyze and guide the mitigation of Urban Heat Island effects. While initially demonstrated using data for specific boroughs (Bronx & Manhattan, NYC), its core framework is designed to be adaptable to other urban areas. The tool:
- Integrates Multi-Source Data: Combines ground-based UHI Index measurements (or similar local heat indicators), local weather station data, satellite imagery (e.g., Sentinel-2), and elevation data.
- Calculates Key UHI Drivers: Computes critical spectral indices like NDVI (vegetation), NDBI (built-up areas), and Albedo (surface reflectivity) from satellite data.
- Provides Interactive Visualization: Features 2D and 3D maps allowing users to visually explore heat distribution and its correlation with various environmental parameters.
- Offers AI-Powered Localized Insights: For a user-selected location, it retrieves relevant data and uses Generative AI (e.g., Google Gemini) to:
- Explain the local heat situation in simple terms, referencing the underlying data drivers.
- Suggest context-specific, actionable mitigation strategies (e.g., increasing green cover, using cool materials) based on the local conditions.
- Builds on ML Interpretability: The approach is informed by underlying machine learning models (like TabPFN, XGBoost, Random Forest) used to predict UHI Index and SHAP analysis to understand the non-linear impact of different environmental factors.
Essentially, it translates complex environmental data into understandable insights and actionable guidance for UHI mitigation.
How we built it
Our development process involved several key stages:
- Data Acquisition & Preprocessing: We started by sourcing diverse datasets relevant to UHI analysis. For our initial case study, this included UHI Index data, NYSMesonet weather records, Sentinel-2 L2A imagery, and NASADEM elevation data. We leveraged the Microsoft Planetary Computer for efficient access to satellite and elevation data. A robust pipeline was built using Python libraries like
pystac-client,planetary-computer,geopandas,rioxarray, andxarrayto handle data fetching, cleaning, alignment, CRS transformations, and raster sampling. - Feature Engineering: Key spectral indices (NDVI, NDBI, Albedo, MNDWI, BSI) were calculated from the Sentinel-2 bands to create meaningful features for UHI analysis.
- Machine Learning (Underlying Research): We developed and benchmarked several machine learning models (Linear Regression, Random Forest, XGBoost, TabPFN) to predict the UHI Index, achieving promising accuracy. SHAP (SHapley Additive exPlanations) was used extensively to interpret these models, understand feature importance, and uncover non-linear relationships.
- Generative AI Integration: We integrated Google's Gemini API using the
google-generativeaiPython library. This involved designing effective prompts that feed localized data to the AI, enabling it to generate tailored explanations of the heat situation and relevant mitigation advice. - Web Application Development: The interactive user interface was built using Streamlit, chosen for its ease of use and rapid development capabilities. Visualization components were created using PyDeck (for 3D maps) and Folium (for 2D maps).
- Iterative Refinement: Throughout the process, we focused on modular design, data validation, and iterative improvements to both the data pipeline and the user-facing application.
Challenges we ran into
Building this project presented several challenges:
- Data Integration Complexity: Merging data from diverse sources with varying spatial and temporal resolutions, and different formats, required careful handling of projections, timestamps, and robust matching/interpolation methods. Ensuring data quality and consistency across these sources was a significant effort.
- Computational Performance: Processing large geospatial datasets, especially sampling satellite rasters for numerous points and training complex ML models, can be computationally intensive. Optimizing the data pipeline and considering sampling strategies for interpretability (like for SHAP) was crucial. Rendering thousands of points on interactive maps also required careful consideration for responsiveness.
- API Interactions & Reliability: Relying on external APIs (Planetary Computer, Google Gemini) meant handling potential connectivity issues, API rate limits, and ensuring robust error handling. Authentication and secure API key management were also important.
- Prompt Engineering for Generative AI: Crafting effective prompts for the Gemini API to elicit accurate, relevant, and easily understandable explanations and mitigation suggestions was an iterative process. We had to carefully structure the input data and instructions.
- Generalizability vs. Specificity: While aiming for a generalizable framework, the initial demonstration relied on specific datasets for NYC. Ensuring the tool's components (data ingestion, feature calculation, AI prompting) are adaptable to data from other cities requires further design considerations.
- User Experience (UX) Design: Balancing the display of complex geospatial information with an intuitive user interface and clear, actionable AI-generated outputs was a continuous design challenge.
Accomplishments that we're proud of
- Successful Multi-Source Data Integration: We successfully built a pipeline to integrate and process complex geospatial and meteorological data from various sources, making it ready for analysis and visualization.
- Effective AI-Powered Insights: The integration of Generative AI to provide localized, easy-to-understand explanations and actionable mitigation advice based on specific data is a key achievement. It bridges the gap between raw data and practical application.
- High-Performing Predictive Models (Underlying Research): Our research demonstrated the ability of advanced ML models like TabPFN to accurately predict fine-scale UHI Index variations, significantly outperforming simpler baselines.
- Deep Interpretability with SHAP: Applying SHAP allowed us to move beyond basic feature importance, uncovering non-linear effects and interactions between UHI drivers, providing richer scientific understanding.
- Interactive & Intuitive Visualization Tool: We developed a user-friendly Streamlit application that makes complex UHI data accessible and explorable through interactive 2D and 3D maps.
- End-to-End Workflow Demonstration: The project successfully showcases a complete workflow from cloud-based data acquisition to interpretable AI-driven decision support.
What we learned
This project was a significant learning experience across multiple domains:
- Geospatial Data Science: We deepened our understanding of handling, processing, and analyzing various types of geospatial data (vector, raster), including working with cloud-based platforms like the Microsoft Planetary Computer and libraries like GeoPandas, rioxarray, and xarray.
- Machine Learning & Interpretability: We gained practical experience in applying and evaluating different ML models for environmental prediction tasks and, crucially, learned the importance and techniques of model interpretability (especially SHAP) to build trust and extract scientific insights.
- Generative AI Application: We learned how to effectively integrate large language models (LLMs) like Gemini into an analytical workflow, particularly in prompt engineering to guide the AI in generating useful and contextually relevant outputs.
- Full-Stack Data Application Development: Building the Streamlit application provided insights into creating interactive data-driven web tools, from backend data processing to frontend visualization and user interaction.
- The Complexity of UHI: We gained a greater appreciation for the multifaceted nature of the Urban Heat Island effect and the interplay of various environmental factors that contribute to it.
- Importance of Interdisciplinary Approaches: Successfully tackling a problem like UHI requires combining knowledge from environmental science, data science, AI, and urban planning.
What's next for GenAi UHI Mitigation
While the current project provides a strong foundation, we see several exciting avenues for future development, aiming to enhance its capabilities and broaden its impact beyond the initial Bronx & Manhattan case study:
- Enhanced Generalizability: Refine the data ingestion and processing pipelines to more easily accommodate datasets from different cities and regions, allowing for wider applicability. This includes developing standardized input formats or more flexible data connectors.
- Integration of More Diverse Data: Incorporate additional relevant data sources such as Land Surface Temperature (LST) from thermal satellite bands, detailed 3D urban morphology data (e.g., building heights, Sky View Factor from LiDAR), and proxies for anthropogenic heat (e.g., night-time lights, traffic data) to create a more comprehensive model of UHI drivers.
- Advanced Spatial Modeling: Explore explicitly spatial machine learning techniques or Geographically Weighted Regression (GWR) to better account for spatial autocorrelation and non-stationarity in the relationships between UHI drivers.
- Quantitative Mitigation Impact Assessment: Extend the AI's capabilities to not just suggest mitigation strategies, but to also provide (even rough) quantitative estimates of their potential cooling impact based on the local conditions and established research.
- User Customization & Scenario Modeling: Allow users to upload their own local data (where feasible and privacy-permitting) or to define "what-if" scenarios (e.g., "what if we increase green cover by 20% in this area?") to see potential impacts.
- Long-Term Monitoring & Trend Analysis: Adapt the tool for multi-temporal analysis to track UHI changes over time and evaluate the effectiveness of implemented mitigation strategies.
- Community Engagement Features: Explore ways to make the tool more engaging for local communities, perhaps by allowing citizen-sourced data inputs or feedback on suggested mitigation strategies.
Log in or sign up for Devpost to join the conversation.