Inspiration
Municipal sidewalk surveys are slow and expensive. Field crews audit 8–12 blocks per hour — a mid-sized city can take years to complete a single pass. Mapillary already has the footage. I wanted to see how much of that manual labor could be automated away with a well-engineered AI pipeline.
What It Does
SideSight ingests Mapillary street-level video for any city bounding box, runs it through TwelveLabs Pegasus 1.2 (via AWS Bedrock), and produces a confidence-scored map of sidewalk conditions — presence, width, curb ramp status, surface defects, obstructions, and hazards — cross-referenced against Overture Maps with GERS IDs attached.
On a demo run over 7.25 km² of San Francisco, it produced 252 detections (mean confidence 86.59%) including 45 actionable infrastructure issues — in under 30 minutes. A field crew would take the better part of a day to cover the same ground. That's an ~85% reduction in survey time, and the AI flags only the locations worth sending someone to.
At city scale, that translates to millions saved annually in labor costs — and a continuously updatable inventory instead of a survey that's stale the moment the crew leaves.
How I Built It
Four-stage pipeline:
- Ingest — Mapillary Graph API with a recursive 8×8 tile grid (handles rate limits gracefully); Overture transportation + Places data pulled from S3 via DuckDB.
- Clip — JPEGs encoded to MP4 via ffmpeg. A duplicate-frame "baseline" clip is generated alongside each real clip as the single-frame control condition.
- Describe — Clips uploaded to S3, submitted to Pegasus 1.2 on Bedrock. Prompt extracts 14 structured JSON fields (sidewalk width, curb ramp compliance, surface defects, etc.) from the streaming response.
- Analyze — Detections spatially joined to Overture segments (GERS IDs attached), classified into 14 types, then scored: F1 per type, RMSE, and a video vs. baseline temporal advantage comparison.
FastAPI + Leaflet.js dashboard with filter controls, video playback, a human review workflow, and a metrics tab.
Challenges I Ran Into
Getting Pegasus to return clean, parseable JSON every time was the hardest part — it required heavy prompt engineering with domain-specific reference scales (e.g., "a standard wheelchair is ~0.7m wide") and regex-based fallback parsing. Building valid Bedrock-compatible MP4s from sparse Mapillary JPEGs via ffmpeg also took significant tuning — early attempts were rejected by the API entirely. And with no labeled ground-truth dataset available in 24 hours, I had to design a confidence-proxy metrics framework that's honest about its limitations while still being rigorous enough to evaluate.
Accomplishments I'm Proud Of
- 252 detections across 7.25 km² with F1 scores from 0.80–0.93
- Built-in video vs. single-frame baseline benchmarking — most projects skip this entirely
- GeoParquet output conforming to Overture's transportation schema, GERS IDs included, ready to feed back into the open data ecosystem
- Human review workflow built into the dashboard from day one
- 79 passing tests in a 24-hour hackathon
What I Learned
Prompt engineering for structured geospatial extraction is genuinely hard — domain knowledge has to be baked into the prompt, not assumed. I also learned that Marengo wasn't the right tool here: Mapillary clips are 10–12 seconds, and Marengo's value is on long continuous footage. Switching to Pegasus-only removed an entire indexing round-trip and improved throughput with no downside.
What's Next for SideSight
- Scale to full city corridors and transit routes
- Temporal change detection — diff the same area across months to catch new damage or completed repairs
- Direct export to city 311 APIs as pre-filled service requests
- Batch contribution of validated detections back to Overture Maps
Built With
- amazon-web-services
- aws-bedrock
- boto3
- duckdb
- fastapi
- ffmpeg
- geopandas
- geoparquet
- html/css
- httpx
- javascript
- leaflet.js-1.9.4
- mapillary-graph-api
- overture-maps-(s3/parquet)
- pandas
- plotly.js
- pyarrow
- pyproj
- pytest
- python-3.12
- shapely
- twelvelabs-pegasus-1.2
- uvicorn
Log in or sign up for Devpost to join the conversation.