Inspiration
PulseMap Agent — Community & Hazard Intelligence Map
A live, community-driven map that blends official hazard feeds with user reports and fast semantic search.
Inspiration
During storms and local incidents, information is scattered: government feeds, social posts, and what neighbors see on the ground. I wanted a single, trustworthy map where people can:
- Report what they see (flooded underpass, fire smoke, downed lines).
- Discover what’s nearby (official alerts + community context).
- Search similar past cases to make better decisions, fast.
This hackathon pushed me to make the map not just “visual,” but searchable and intelligent using TiDB Serverless for vector/semantic search.
What I built
PulseMap is a full‑stack web map that:
- Shows official hazards (NWS alerts, EONET events, FIRMS fires) and community reports.
- Lets anyone drop a report with location and notes.
- Implements two hackathon features with TiDB: 1) Ingest & Index Data: store reports + embeddings in TiDB Serverless with an HNSW vector index. 2) Search Your Data: query with semantic vector search (KNN by cosine distance) to retrieve similar reports.
- Uses a runtime config endpoint to load Google Maps API keys safely (no rebuilds needed).
- Persists data correctly on Hugging Face Spaces using the
/datamount (avoids read-only/app).
If you’re new to maps: a marker is a point at a latitude/longitude, a layer is a group of visual features, and a basemap is the underlying road/satellite tiles. We render markers for hazards/reports and add overlays for special data.
How it works (simple flow)
- User reports an incident on the map (lat/lon + text + optional props).
- Backend (FastAPI) creates a record and generates an embedding for the text (OpenAI small embedding).
- The report + embedding are inserted into TiDB (
VECTOR(1536)column) and indexed with HNSW (cosine). - When someone searches (“wildfire smoke near school”), the backend:
- Embeds the query,
- Runs a KNN vector search in TiDB (cosine distance),
- Returns the most similar reports as GeoJSON features.
- The frontend (React + @vis.gl/react-google-maps) renders the results as markers and lists a short summary.
Vector search, in plain words: we turn text into a numeric vector (embedding). Similar meanings land near each other in vector space. KNN finds the closest points (reports) to your query’s vector.
Tech stack (just what matters)
- Frontend: React + TypeScript,
@vis.gl/react-google-maps, Vite - Backend: FastAPI, LangGraph (optional summarizer node)
- DB & Search: TiDB Serverless (MySQL‑compatible) with
VECTOR(1536)+ HNSW index - Embeddings: OpenAI
text-embedding-3-small(1536‑d) - Geo tooling: GeoJSON; optional GeoPandas (with
pyogrioengine) for preprocessing - Infra: Hugging Face Spaces (
/datamount for persistence), runtime config endpoint for Maps API keys - Assets: Large shapefiles are tracked with Git LFS (no >10MB blobs in git history)
Key implementation details
1) TiDB: Ingest & Index (feature #1)
Table (simplified):
CREATE TABLE IF NOT EXISTS reports ( id BIGINT PRIMARY KEY, lat DOUBLE, lon DOUBLE, text TEXT, props JSON, created_at TIMESTAMP NULL, embedding VECTOR(1536) ); CREATE VECTOR INDEX IF NOT EXISTS idx_reports_embed ON reports ((VEC_COSINE_DISTANCE(embedding))) USING HNSW;Ingestion path (on report add):
- Compute embedding for
text. - Insert row with
embeddingas a JSON‑like vector literal:"[0.12,-0.34,...]". - TiDB’s HNSW index accelerates KNN by cosine distance.
- Compute embedding for
Seeder script migrates old SQLite reports → TiDB (idempotent, batch embeddings).
2) TiDB: Search Your Data (feature #2)
- Endpoint:
GET /search?q=...&k=10 - Backend embeds the query and runs:
sql SELECT id, lat, lon, text, props, created_at, VEC_COSINE_DISTANCE(embedding, :qvec) AS dist FROM reports ORDER BY VEC_COSINE_DISTANCE(embedding, :qvec) LIMIT :k; - Results are returned as GeoJSON features for the map.
3) Runtime Google Maps config
- Backend exposes:
py @router.get("/config/runtime") def runtime_config(): return { "VITE_GOOGLE_MAPS_API_KEY": settings.google_maps_api_key, "VITE_GOOGLE_MAPS_MAP_ID": settings.google_maps_map_id } - Frontend fetches this once and feeds it to
<APIProvider>; keeps keys out of the bundle and supports rotation without rebuilds. - Env aliases allow both
VITE_GOOGLE_MAPS_MAP_IDand the olderVITE_GOOGLE_MAPS_MAP_IDYnames, so my local/dev/Space stay in sync.
4) HF Spaces data persistence
/appis read-only;/datais writable and persisted.- My
Settingspicks the first writable dir from:DATA_DIRenv →/data→<repo>/data→ temp fallback
- On startup, it tests write, then creates
uploads/and DB paths. This eliminated the earlierPermissionError: '/app/data'crash.
5) Large geospatial files via Git LFS
- Shapefile sidecars (
.shp/.dbf/.shx/.prj/.cpg) are tracked in Git LFS and migrated from history. - This bypasses the Space’s
>10MBpre‑receive hook and keeps the repo lean.
Challenges & how I solved them
- HF push rejected (large files): The pre‑receive hook blocks binaries >10MB. I fixed it by migrating all shapefile sidecars to Git LFS (not just
.shp), then force‑pushing rewritten history. - Read‑only filesystem: The app tried to create
/app/data. I implemented a writable dir chooser that prefers/dataand verifies write with a tiny test file. - Map keys leak risk: Instead of bundling keys at build time, I serve them via a runtime endpoint. That made rotation easier and removed rebuild friction.
- Embedding/Index setup: Ensuring TiDB
VECTOR(1536)matched the model, and building an HNSW index with cosine distance to keep queries fast. - SQLite → TiDB migration: Wrote a seed script with batching, retries, and
ON DUPLICATE KEY UPDATEso it’s safe to re‑run. - Geo deps on Spaces: Pinned
geopandas + pyogrio + shapely + pyprojso wheels install cleanly without GDAL compile pain; default toengine="pyogrio".
What I learned
- TiDB Serverless makes vector search feel “SQL‑native.” Once the
VECTORcolumn and HNSW index are in place, KNN is just anORDER BYon cosine distance. - Runtime config is a clean pattern for map keys and other front‑end toggles—no rebuild required.
- Storage rules matter on PaaS: choosing the right writable mount (
/data) avoids fragile hacks. - Versioning geo assets with LFS prevents painful pushes and keeps deploys stable.
- How to explain vector search to non‑ML users: “search by meaning,” not exact words.
Next steps
- Hybrid search: blend full‑text (BM25) + vector KNN for even better relevance.
- Explainability: highlight matched phrases + show distance/confidence badges.
- Reverse geocoding: attach street/POI names to reports via the Maps Geocoding API.
- Verification workflows: thumbs‑up/down, escalation, and authority verification.
- Notifications: Slack/Email/WebPush for nearby critical alerts.
Minimal setup notes (for judges)
- Secrets:
TIDB_URL,OPENAI_API_KEY,VITE_GOOGLE_MAPS_API_KEY,VITE_GOOGLE_MAPS_MAP_ID. - Endpoints:
/add_report,/search,/config/runtime. - Run: Hugging Face Space (FastAPI backend + React frontend) with data persisted under
/data. - Seeder:
python -m backend.scripts.seed_tidb_from_sqlite(idempotent).
Why this matters
When the weather turns or emergencies hit, minutes matter. PulseMap merges official signal with human eyes on the ground, and makes it searchable by meaning—so communities can discover, verify, and act faster.
Built With
- fastapi
- geopandas
- git
- google-maps
- langchain
- langgraph
- newsapi
- openai
- pymysql
- pyproj
- python
- react
- shapely
- sql
- tidb
- typescript
- vite
- weatherapi
Log in or sign up for Devpost to join the conversation.