Inspiration
Our work was inspired by one of the largest challenges in emergency medicine: efficient response time in EMS systems.
We were motivated by how predictive modeling and real-time optimization have transformed industries like logistics (UPS Route Optimization), ride-sharing (Uber Surge Pricing), and supply chain management (Walmart Demand Forecasting), but were puzzled by how emergency response systems are so far behind the curve.
We thought, if Uber can position drivers before a crowd leaves a stadium, why are ambulances still waiting for a call to come in?
We built FirstWave to answer this question with real data.
What it does
FirstWave analyzes millions of rows of EMS incident dispatch data to predict optimal positioning zones for ambulances, helping paramedics lower incident response time and increase the percentage of incidents responded to within the critical 8-minute target: saving lives and improving patient outcomes.
What separates FirstWave
- Surge predictions which respond to time-of-day, weather, and simulations of user-inputted events.
- Interactive dashboard that lets dispatchers visualize predicted demand hotspots and adjust staging recommendations in real time.
- Counterfactual impact engine that quantifies before-and-after response times and 8-minute coverage improvements using historical incidents.
How we built it
FirstWave is split into four components, which work together to predict ambulance demand and surface actionable staging recommendations to dispatchers in real time:
Data Pipeline
- DuckDB processing of 28.7M NYC EMS incidents with a 9-layer quality filter, removing mismatched borough/zone codes, invalid response time flags, transfers, standbys, reopens, and the entire COVID-distorted 2020 year
- Produced 5.6M clean training incidents (2019, 2021, 2022) and 1.5M holdout incidents (2023)
- Weather merged from the Open-Meteo historical API on hourly timestamps
- CDC Social Vulnerability Index scores aggregated from census tract level to dispatch zone level
Machine Learning
- XGBoost demand forecaster trained on 20 features: cyclical time encodings (sine/cosine for hour, day-of-week, month), weather conditions, SVI scores, zone-level historical baselines, heat emergency flags, and infrastructure disruption indices
- Weighted K-Means staging optimizer placing units at centers of predicted demand, with a borough-fairness constraint guaranteeing at least one staging point per borough before allocating extras proportionally
- OSMnx road network routing computing actual shortest-path travel times across the NYC road graph for all 1,891 zone pairs
- Counterfactual engine replaying every historical high-acuity incident with weather-adjusted travel times from staged positions
Backend
- FastAPI serving 7 endpoints with sub-400ms response budgets; demand model and staging optimizer run inference in under 300ms
- Artifact-based architecture loading the XGBoost model, drive-time matrix, zone statistics, and precomputed counterfactuals from Parquet and pickle files at startup
- PostGIS storing NYC dispatch zone boundary geometries for spatial queries
- AI dispatcher endpoint forwarding context-enriched prompts to GPT-4o-mini and parsing structured control changes from the response
Dashboard
- React 19 with Mapbox GL JS rendering a dark-themed interactive map across 31 dispatch zones; TanStack React Query managing all data fetching with 300ms debounced parameter changes
- Choropleth layer coloring zones by predicted demand intensity; staging pins with 3,500m coverage circles updating on every query
- Equity overlay rendering ZIP-level SVI scores in a purple gradient, with impact panel breaking down response time savings by SVI quartile
- AI panel supporting auto-briefing on every forecast change and interactive chat with undo for map control modifications
- Watch the Wave animating the hour slider at 1.5-second intervals across a full 24-hour demand cycle
Datasets Used
NYC EMS Incident Dispatch Data: https://data.cityofnewyork.us/Public-Safety/EMS-Incident-Dispatch-Data/76xm-jjuj
Open-Meteo Historical Weather API: https://open-meteo.com/en/docs/historical-weather-api
CDC Social Vulnerability Index: https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
NYC Permitted Event Information Historical: https://data.cityofnewyork.us/City-Government/NYC-Permitted-Event-Information-Historical/bkfu-528j
NYC ZIP Code Boundaries: https://data.beta.nyc/dataset/nyc-zip-code-tabulation-areas
OSMnx NYC Road Network: https://osmnx.readthedocs.io
Challenges we ran into
Data Quality at Scale
- Filtering 28.7M rows down to a reliable training set required identifying and removing mismatched borough-zone codes, invalid response time flags, transfers, standbys, and reopened incidents, nine separate quality checks applied in a single DuckDB pass
- 2020 had to be excluded entirely after live queries confirmed COVID lockdowns caused a ~100,000 incident drop that would have taught the model a false seasonal pattern
The Zone Baseline Feature
- Early model runs with RMSE above 6 traced back to a failed merge on zone_baseline_avg, the single most important predictor, carrying 47% of feature importance. Without it, predictions were nearly random. - Debugging the join keys across aggregated and raw incident tables was the most consequential hour of the hackathon.
Modeling Demand at Zone Granularity
- Early model runs with RMSE above 6 traced back to a failed merge on the zone baseline feature, the single most important predictor, requiring careful debugging of join keys across aggregated and raw incident tables
- Cyclical time features (sin/cos encoding of hour, day of week, and month) were critical to prevent the model from treating 11PM and 1AM as far apart
Drive-Time Computation
- Computing a full zone-to-zone drive-time matrix across NYC's road network via OSMnx required downloading ~400MB of road graph data and running single-source Dijkstra from every origin.
- Databricks connection failures mid-hackathon forced a full pivot to a local DuckDB + pandas pipeline under time pressure, rewriting all 8 scripts without changing a single artifact schema or API contract
Accomplishments that we're proud of
Within 36 hours, our team successfully built and delivered a production-ready EMS staging platform that includes:
- 86% 8-minute coverage, up from 64.7% baseline. That's approximately 340,000 additional incidents per year arriving within the clinical survival window.
- 3 minutes 19 seconds saved: median response time reduction across all zones, weighted by predicted demand. In cardiac arrest terms, a meaningful shift in survival probability.
- Equity by design. The staging algorithm guarantees every borough gets coverage before allocating extras by demand. The most vulnerable communities (SVI Q4) save the most: 299 seconds per incident.
- A working AI Dispatcher. Natural language scenario exploration that actually modifies the dashboard → dispatchers ask questions in plain English and the map responds.
- Full end-to-end in 36 hours: 28.7M raw rows through 8 pipeline scripts, 20-feature model training, 7 API endpoints, and a full interactive dashboard with AI chat, equity overlay, animated playback, and live staging recommendations.
What we learned
Through FirstWave, we gained hands-on experience building an end-to-end predictive allocation system, from large-scale data preprocessing to gradient-boosted demand forecasting and real arrival-time computation. We had to translate model predictions into concrete ambulance staging decisions, which required a lot of decision-making and planning. This process taught us to balance statistical performance, infrastructure constraints, and equitable service access for higher-vulnerability communities. Rigorous counterfactual evaluation ensured improvements were clinically meaningful, not just statistically significant.
What's next for FirstWave
Our next major milestone would be integrating FirstWave directly into live EMS dispatch systems. This means moving from simulation-based recommendations to real-time operational deployment, where ambulance staging decisions update dynamically as new 911 calls, traffic conditions, and weather events occur.
From there, we want to optimize the full response cycle rather than just arrival time. Right now the model treats each incident independently, but in practice a unit that responds to a call becomes unavailable, hospital turnaround takes 20 to 40 minutes, and crew shift changes affect coverage. Incorporating those constraints, along with congestion-aware routing, would make staging recommendations grounded in actual unit availability rather than theoretical demand alone.
Two longer-term directions we are actively thinking about: first, a reinforcement learning agent that continuously re-positions idle units as demand shifts throughout a shift, learning from how the city actually behaves rather than relying solely on historical patterns. Second, multi-city expansion. The pipeline is already parameterized by zone geometry, centroids, and SVI data, so onboarding a new city means swapping a configuration layer rather than rebuilding the system. Any municipality with open EMS dispatch data is a potential deployment target.
The thread connecting all of it is the same goal as the original build: a system that reduces structural response time inequities and gets resources to the communities that need them most, faster.
Built With
- duckdb
- fastapi
- javascript
- k-meansoptimizer
- mapbox
- osmnx
- python
- react
- scikit-learn
- tailwind
- vite
- xgboost
Log in or sign up for Devpost to join the conversation.