Inspiration

We were inspired by the challenges of maintaining high-quality geographic data in large urban areas. While working with POIs and road networks in Mexico City, we discovered spatial mismatches that could lead to critical errors in applications like mapping services, navigation systems, and public resource planning. This motivated us to build a system capable of detecting and flagging such issues automatically.

What it does

Our project identifies and classifies spatial errors in Points of Interest (POIs) by analyzing their alignment with the road network. Specifically, it:

  • Verifies if a POI exists in OpenStreetMap using Overpass API.
  • Detects if a POI is placed on the incorrect side of the road.
  • Identifies incorrect MULTIDIGIT assignments for POIs.

The output is a dataset enriched with diagnostics that help urban planners or analysts quickly spot and correct invalid entries.

How we built it

We used Python with libraries like pandas, GeoPandas, Shapely, and requests to process and analyze geospatial data.
Our steps included:

  • Parsing road geometries (LINESTRING) and interpolating coordinates at a target percentage along each segment.
  • Querying OpenStreetMap via Overpass API to check for nearby POIs.
  • Merging and comparing datasets to detect inconsistencies based on spatial logic.
  • Creating modular scripts for each validation case.

Due to limited computational resources, we processed a subset (300 rows) from the original datasets and focused on one tile of the city at a time.

Challenges we ran into

  • Computational limits: We couldn't analyze all 20 tiles at once, so we had to subset and optimize our workflow.
  • API rate limits and timeouts: The Overpass API occasionally failed or timed out, requiring caching and retry logic.
  • Ambiguous POI names: Repeated POIs (like Oxxo) required logic to identify the closest matching location.
  • Complex geometry parsing: Interpolating coordinates along LINESTRING geometries was more complex than expected.

Accomplishments that we're proud of

  • Successfully developed a modular, automated pipeline for spatial validation.
  • Validated and visualized real mismatches between POIs and road segments.
  • Built a framework that can scale with better infrastructure.
  • Enhanced our geospatial and data engineering skills.

What we learned

  • How to manipulate and analyze spatial data with GeoPandas and Shapely.
  • How to design Overpass API queries and handle real-world data inconsistencies.
  • How to build reproducible and interpretable logic-based validation systems.
  • The real-world importance of data quality in urban planning and logistics.

What's next for Automatically Correcting Spatial Validations

  • Scale up the system to include all 20 tiles and the complete POI database.
  • Add a visual validation interface for interactive correction and review.
  • Integrate contextual features (like road type or land use) to improve accuracy.
  • Automate classification of spatial errors for batch validation.
  • Explore deployment as a tool for city governments or open data platforms.

Built With

Share this project:

Updates