INSPIRATION

The day no one was warned

On January 29, 2025, the sacred confluence at Prayagraj was bathed in predawn darkness when a stampede erupted at the Maha Kumbh Mela. Over 30 pilgrims were killed — crushed in a crowd of millions who had come seeking spiritual renewal. Survivors described the same thing: no warning. No rerouting. No signal that the crowd behind them had become a wall of human pressure with nowhere to go.

The AI surveillance cameras saw it. The 2,300 CCTV feeds tracked it. The density data existed somewhere on a server. But it never reached the pilgrim 200 metres away deciding whether to walk toward the ghat or turn back. It never reached the volunteer 500 metres back who could have slowed the flow. The intelligence existed — but the loop was broken.

We read the Kumbhathon Innovation Foundation's field report on Prayagraj 2025. Forty-seven pages of what went wrong, what almost went wrong, and what will go wrong again unless something changes. One sentence stopped us cold:

"In the absence of signage and sufficient navigation tools, the police personnel deployed became the sole reliable point of contact to guide our way through the expansive mela shetra."

A police officer. The human equivalent of a 404 page. That was the state of the art.

Nashik 2027 is not Prayagraj 2025 — it is harder

The next Kumbh is at Nashik. June through September 2027. During the monsoon. At two cities 30 km apart. And at the centre of it all: the Kushavart Kund in Trimbakeshwar — a sacred pond 75 feet by 75 feet (roughly the size of a tennis court) that 7.5 million pilgrims are expected to reach on a single Amrit Snan day.

We calculated the math. At safe crowd density, Kushavart Kund holds 1,900 persons. Seven and a half million people will try to get there on that one day. The approach lanes, the temple precinct, the mountain road from Nashik — all of it becomes a pressure cooker with no safety valve unless someone builds one.

That someone is us.


WHAT IT DOES

KumbhSafe is a five-layer intelligence platform: Sense → Understand → Decide → Act → Command.

Sense

IoT crowd pressure sensors and CCTV cameras feed into two AWS Kinesis Data Streams (kumbhsafe-iot-stream and kumbhsafe-cctv-stream). AWS Rekognition processes camera frames every 30 seconds, computing crowd density per zone in people per square metre. Godavari river water level gauges and IMD weather feeds enter the same pipeline.

Understand

Six AI agents built with the Strands Agent Framework and deployed on AWS Bedrock AgentCore analyse the incoming data:

  • CrowdSentinel — monitors all 12 zones every 30 seconds. Enforces the Kushavart hard cap: if the kund crosses 1,900 persons, it triggers an entry hold with no human approval required.
  • FloodWatch — polls the Godavari level every 5 minutes during monsoon months. If the river rises more than 0.5m per hour during active bathing, ghat closure alerts go out immediately.
  • RouteOracle — when any zone hits RED or BLACK density, it calculates safe alternative paths and pushes rerouting SMS to every pilgrim registered in the danger radius — in their own language (Marathi, Hindi, Gujarati, Tamil, Kannada, or English).
  • MedEvac — handles SOS triggers. Identifies the nearest available ambulance, dispatches it, and creates a green corridor request to the Police API — all within 8 seconds.
  • LostConnect — multilingual voice bot that creates lost-person cases and uses Rekognition facial recognition to match them against registered pilgrims.
  • CommandBridge — the orchestrating agent. Receives inputs from all specialists via a invoke_child_agent tool, coordinates cross-city responses, and manages the 30 km Nashik-Trimbakeshwar corridor during simultaneous peak events. ### Decide The agents are not passive dashboards. They write directly to DynamoDB, create alerts, activate zone holds, dispatch vehicles, and send pilgrim SMS. Safety-critical rules are additionally enforced at the stream processor level — independent of AI availability — so the Kushavart cap and NDRF notification can never be accidentally disabled by a misconfigured agent.

Act

Outbound actions flow through five SNS topics: critical-alerts (ICCC + Police), pilgrim-sms (bulk Pinpoint dispatch in 6 languages), ndrf-webhook (signed external API to National Disaster Response Force), medical-alerts (Medical Staff + ICCC), and admin-notifications (Super Admin watchdog). All agent failures during active hours (0600–2200 IST) trigger CloudWatch alarms within 90 seconds.

Command

The ICCC Command Dashboard — built with Vercel v0 using the Cloudscape Design System — gives every authorized operator a live view: zone density heatmap, alert feed with SOP checklists, agent status board, ghat conditions, and a dual-city split view. AppSync WebSocket subscriptions push zone updates to every connected dashboard in real time, without polling.


HOW WE BUILT IT

The database architecture decision

The hardest call we made was splitting the data layer into two purpose-built stores.

AWS DynamoDB handles everything real-time and operational. Zone density snapshots arrive at 200,000 writes per hour during peak days — one record per zone per 30 seconds, plus a LATEST item that always holds the current state. DynamoDB's on-demand capacity means we never provision for peak and then watch credits burn on quiet days. Eight tables, each with GSIs tuned for our specific query patterns: list all RED zones in Nashik, find a pilgrim by phone number, get the 20 most recent critical alerts. DynamoDB Streams are the nervous system — every zone status change triggers downstream actions automatically.

AWS Aurora DSQL handles everything relational and configuration-driven. Platform config, zone thresholds (each of the 12 zones has custom density limits — Kushavart's thresholds are significantly lower than the platform defaults), user management, audit logs, notification templates in 6 languages, SOP step sequences, API keys. We chose Aurora DSQL specifically for its active-active multi-region setup across ap-south-1 (Mumbai) and ap-southeast-1 (Singapore). During Amrit Snan day, if Mumbai has any issue, Singapore takes over with zero data loss. That zero-RPO guarantee matters when the alternative is ICCC operators losing their config during the most dangerous hours of the event.

The frontend: v0 and Cloudscape

We built the entire ICCC dashboard using Vercel v0 with a detailed prompt that referenced the Cloudscape Design System throughout. Cloudscape is AWS's own operational UI framework — it's built for exactly this kind of command-centre dashboard with <Table> multi-select + bulk actions, <LineChart> for live density trends, <Flashbar> for streaming critical alerts, and <AppLayout> for the operational shell. The v0 prompt specified every screen: zone heatmap, alert manager with SOP checklist, agent monitor with raw AgentCore config view, and the pilgrim services tab with lost-and-found case management.

The result deploys to Vercel's edge network, giving sub-100ms load times for operators in both Nashik and Trimbakeshwar on mobile networks.

The agent architecture: Strands on Bedrock AgentCore

Each Strands agent is a Python class with a curated tool set. All agents share a common shared/dynamo_tools.py library — read_zone_density, write_alert, hold_zone_entry, send_pilgrim_sms. The tools have complete docstrings because Bedrock uses those docstrings as tool descriptions during inference. Tool description quality directly affects agent decision quality.

CommandBridge has one tool no other agent has: invoke_child_agent. This prevents specialist agents from accidentally cross-invoking each other and creates a clear orchestration boundary. Every agent runs at temperature=0.1 — this is safety infrastructure, not creative writing. We want deterministic, cautious decisions.

The agents are deployed as managed runtimes on Bedrock AgentCore, invoked by EventBridge schedules (CrowdSentinel every 30 seconds, FloodWatch every 5 minutes during monsoon months) or by zone status change events. Hard monsoon rules are enforced in the CDK EventBridge stack — the FloodWatch interval is 5 minutes regardless of the agent_config table in Aurora.

Multi-user platform: 7 roles, super admin control

The platform supports seven user roles from SUPER_ADMIN down to VIEWER, each with a precise permission matrix covering 22 distinct permissions across zones, alerts, pilgrims, medical, agents, and platform config. Role is stored as a Cognito custom attribute (custom:role) alongside custom:orgId and custom:city_access. The city access attribute means a field officer in Nashik can never accidentally see or modify Trimbakeshwar zone data — enforced at the Lambda handler level, not just the UI.

Super Admin bootstraps the entire platform: 12 zones with per-zone thresholds, 6 agents with system prompts, 36 notification templates (6 event triggers × 6 languages), 28 SOP templates, and all platform config keys. Config changes propagate to all 40+ Lambda functions within 60 seconds via a DynamoDB config cache layer.


CHALLENGES WE RAN INTO

The offline-first problem. Pilgrims near Kushavart Kund in Trimbakeshwar often have minimal mobile connectivity. Sending them an SMS reroute when the zone hits RED assumes their phone receives the message. Our solution: pre-stage routes and cached zone information at QR kiosk terminals distributed throughout the mela grounds, and use SNS with delivery receipts to track whether messages are actually landing. Undelivered messages escalate to volunteer WhatsApp group broadcasts.

The monsoon data gap. There is no single API that gives real-time Godavari water level at Ramkund. India's Central Water Commission data is available but requires scraping specific station data from their portal. We built a FloodWatch tool that combines CWC river station readings with IMD hourly rainfall forecasts to project water level 3 hours forward. A 0.5m rise in the last hour plus a 50mm/hr rainfall forecast triggers ghat closure proactively — not reactively.

The dual-city coordination problem. On Amrit Snan day, both cities peak simultaneously. Every ambulance and police reinforcement vehicle uses the same 30 km mountain road. We built a CommandBridge decision tree that explicitly models the road as a constrained resource: if both cities declare emergencies simultaneously, it triggers a pre-agreed mutual-aid protocol with NDRF deployed at the midpoint rather than at either city.

The system prompt engineering for safety. Making agents appropriately decisive without being trigger-happy required significant iteration. CrowdSentinel's system prompt went through eleven versions before we had a version that correctly held entry at BLACK density but didn't flood the alert queue with duplicates during sustained RED conditions. The key insight: agents need explicit negative examples. "Do NOT create duplicate alerts for the same zone within 10 minutes unless severity increases" was as important as the positive rules.

Keeping safety rules out of the agents. We initially planned to have CrowdSentinel enforce the Kushavart 1,900-person hard cap. We realized that's wrong — agent invocation can fail, be delayed, or have a response time spike. The hard cap now lives in the DynamoDB zone-stream Lambda processor as a hard-coded check that fires synchronously on every LATEST item write. The agent enhances safety; the stream processor guarantees it.


ACCOMPLISHMENTS WE'RE PROUD OF

An architecture that could actually go to production. We didn't build a demo. Every design decision — separate tables vs single-table DynamoDB, Aurora DSQL active-active, EventBridge invocation over Lambda-to-Lambda, Strands tool docstrings, city_access at the auth layer — reflects a production constraint. The Kumbhathon Foundation's 40+ stakeholder interviews are embedded in the architecture: multilingual SMS came from their finding about WhatsApp groups outperforming official apps; the offline-first navigation came from their navigation breakdown section; the dual-city coordination gap came from their comparative analysis of Prayagraj vs Nashik geography.

The Kushavart protocol. The 75×75 ft constraint is the most dangerous single point of any Kumbh in history. We designed around it specifically: custom density thresholds in zone_config, a hard-coded 1,900-person cap in the stream processor, a dedicated camera cluster, and a staggered entry slot system where RouteOracle manages pilgrim timing from Nashik to Trimbakeshwar. No other platform we found has addressed this specific constraint.

Multi-agent orchestration that mirrors human incident command. The CommandBridge → specialist agent hierarchy mirrors the ICS (Incident Command System) used by real emergency management. CommandBridge is the Incident Commander. CrowdSentinel, FloodWatch, RouteOracle, MedEvac, LostConnect are section chiefs. They report up; the commander coordinates across. The architecture wasn't arbitrary — it was modelled on how professional emergency managers actually organize a response.

A platform built for people who can't use it wrong. City access filters prevent operators from seeing data outside their jurisdiction. Immutable alert fields prevent evidence tampering. The audit log in Aurora is append-only. SOP templates surface automatically when an alert is created. The Cloudscape design system's accessibility features mean operators using the dashboard at 3am in monsoon conditions can still read the zone status clearly.


WHAT WE LEARNED

The biggest lesson: the most important safety systems are the ones that work when AI doesn't.

We came in expecting to build an AI-first platform. We built an AI-enhanced platform with non-AI safety guarantees at every critical juncture. The stream processor is code, not a prompt. The NDRF webhook fires from the alert-stream Lambda, not from an agent. The Kushavart hard cap is a conditional check, not a model decision. AI makes the platform dramatically more capable — but it cannot be the only line of defence when 22.5 million people are in two cities on the same day.

The second lesson: good database design is a safety feature. The LATEST SK pattern in DynamoDB — writing two items per density update, one time-series snapshot with TTL, one overwriting the current state — lets us serve real-time zone data in a single GetItem call with strong consistency while also powering 6-hour history charts from the same table. If we had put everything in one item and kept appending to a list, we'd have hit the 400KB DynamoDB item limit during a peak day. That bug would have silently dropped density data during the most dangerous hours.

The third: language is infrastructure. SMS rerouting in English to a Marathi-speaking elderly pilgrim is not a message. Our 36 notification templates (6 triggers × 6 languages) aren't a nice-to-have feature — they're the difference between a reroute that works and a reroute that is ignored.


WHAT'S NEXT

Immediate (before event, 2026–2027):

  • Partner with Kumbhathon Innovation Foundation for pilot deployment
  • Integrate with NTKMA's official ICCC system via API
  • Deploy QR wristband registration at Nashik railway station and bus terminals
  • Conduct tabletop exercises with Police, NDRF, and Medical teams using the CommandBridge protocol
  • Load test at 2,000 zone writes/minute sustained for 4 hours — simulating peak Amrit Snan day Medium term:
  • Extend to Haridwar Kumbh 2027 and Ujjain Kumbh 2028 using the same platform with event-specific zone configs
  • Open the platform as an open-source civic infrastructure framework for large public gatherings globally — from Hajj to Rio Carnival to Indian Republic Day parades
  • Add predictive surge modelling: using historical crowd flow patterns + real-time data + weather to predict density 30 minutes ahead, not just react to it Long term:
  • Make KumbhSafe the operating system for any large public gathering in India — a reusable, white-label safety platform that any state government can configure for their event in under a day, with Super Admin bootstrap complete in under 10 minutes

BUILT WITH

Frontend

Technology Role
Vercel v0 AI-generated UI scaffold using natural language prompts
Next.js 14 App Router Full-stack React framework
@cloudscape-design/components AWS Cloudscape Design System — operational dashboard components
@cloudscape-design/global-styles Dark/light mode, CSS variables
AWS Amplify (AppSync client) WebSocket subscriptions for real-time zone updates

Backend — Compute

Technology Role
AWS Lambda (Node.js 20.x) 40+ REST API handlers
AWS API Gateway REST API with Cognito authorizer + WAF
AWS AppSync GraphQL subscriptions — onZoneUpdate, onAlertCreated, onAgentUpdate
@aws-lambda-powertools/logger Structured JSON logging with X-Ray tracing

Backend — Agents

Technology Role
AWS Bedrock AgentCore Managed runtime for all 6 agents
Strands Agent Framework (Python) Agent definitions with typed tool-use
amazon.nova-pro-v1:0 Primary model for all agents
AWS Lambda (Python 3.12) Agent handler functions invoked by Bedrock

Databases

Technology Role
AWS DynamoDB 8 tables · on-demand · DynamoDB Streams · real-time operational data
AWS Aurora DSQL PostgreSQL-compatible · active-active ap-south-1 + ap-southeast-1 · IAM auth · platform config + users + audit

Events & Messaging

Technology Role
AWS Kinesis Data Streams IoT sensor + CCTV Rekognition ingestion
AWS EventBridge Agent schedules + safety event routing
AWS SNS 5 topics — critical alerts, pilgrim SMS, NDRF webhook, medical, admin
AWS SQS Agent task queue with retry + DLQ
DynamoDB Streams Zone/alert/pilgrim change event processing

Security & Identity

Technology Role
AWS Cognito User Pools JWT authentication · 7 user roles · MFA for admins
AWS IAM Lambda execution roles · Bedrock agent permissions
AWS Secrets Manager External API keys (IMD, CWC, NDRF)
@kumbhsafe/rbac Custom RBAC package — 22 permissions × 7 roles

Storage & ML

Technology Role
AWS S3 Pilgrim photos, lost-found images, reports, exports
AWS Rekognition Crowd density from CCTV · facial recognition for lost persons

Infrastructure

Technology Role
AWS CDK v2 (TypeScript) 11 stacks · full IaC
Turborepo Monorepo build pipeline
pnpm workspaces Package management

Built With

Share this project:

Updates