INSPIRATION

─────────────────────────────────────────────────────────────────────────────

This problem is not unknown. It is ignored.

Delhi-NCR records over 14,000 cases of crimes against women every year — the highest of any metropolitan region in India. That number has been reported, analysed, condemned, and published annually for over a decade. Every time an incident emerges, the response follows the same pattern: investigation after the fact, statements from officials, calls for better infrastructure. The cameras that recorded the incident become evidence. They were never intelligence.

I kept returning to one question while building SafeTrace: why does the reasoning layer not exist between the camera signal and the response? The cameras are already deployed across Delhi-NCR — industrial zones, isolated stretches, parking complexes, transit hubs, markets, residential corridors. The data is already flowing. Elasticsearch is already capable of detecting the pattern in that data, correlating it geospatially, comparing it against ninety days of zone-specific history, and producing a dispatch recommendation with a written justification — in the time it takes a control room operator to notice an alert on a screen.

The gap is not hardware. It is not data volume. It is not compute. The gap is that nobody built the reasoning layer. SafeTrace is that layer. The inspiration was not an idea — it was a recognition that the components to build this have existed for years, Elasticsearch made them composable for the first time, and the cost of not building it is measured in incidents that should not have happened.

WHAT IT DOES

─────────────────────────────────────────────────────────────────────────────

SafeTrace monitors fifty cameras across six Delhi-NCR zone types — industrial, isolated, parking, transit, market, and residential — each feeding structured events into an Elasticsearch safety_events index in real time. Every event carries a camera ID, zone type, geographic coordinates, risk score from the edge vision model, gender counts, an alone flag, an SOS gesture flag, and a timestamp. These are not logs. They are timestamped intelligence signals waiting to be reasoned about.

Elastic Watcher polls the safety_events index on a sub-minute schedule, running a query that fires when risk_score > 70 or gesture_sos == true within the last sixty seconds. When the condition is met, Watcher's Painless transform extracts the highest-risk hit from that window and posts it as a structured JSON payload to SafeTrace's backend webhook. This is the detection layer. It does not reason. It watches, detects, and hands off — which is exactly what it is designed to do.

The reasoning happens in Elastic Agent Builder. The agent receives the alert context and begins a three-tool investigation. The first tool, fetch-zone-historical-baseline, queries ninety days of historical data in the alert_log index to compute the zone's escalation rate — how often past alerts in this zone became confirmed incidents. The second tool, correlate-adjacent-cameras, runs an ST_DISTANCE() geospatial query against the camera_registry index to find every camera within five hundred metres, placing them on elevated alert as corroborating sensors. The third tool, calculate-composite-risk, runs an ES|QL EVAL chain that multiplies the raw risk score by four factors: the zone's base risk multiplier, the surrounding gender ratio signal, the historical amplifier derived from Tool 1's escalation rate, and a night-hour penalty when the event occurs between 20:00 and 06:00. The final score and threat level come out of Elasticsearch — not out of application logic.

When gesture_sos is true, the agent skips all three tools entirely. A human distress signal is not a datapoint to be weighed against a composite formula. The system prompt encodes this as an architectural constraint: gesture_sos produces threat_level = CRITICAL and final_score = 100 immediately. Every other path through the agent ends with a written agent_reasoning field — one to three sentences that name the specific escalation rate and composite score that drove the dispatch decision. For a law enforcement application, every patrol dispatch must be traceable. SafeTrace makes traceability a property of the data layer, not an afterthought.

The Svelte 5 dashboard renders all fifty cameras as Google Maps Advanced Markers colour-coded by current threat level. When an alert fires, the agent trace panel replays each investigation step with staggered animation — tool calls, results, reasoning, dispatch — so a control room operator can see not just that a patrol was dispatched, but why. A patrol unit then animates along stored road waypoints from the nearest police station to the camera location. The alert card shows the agent's written justification, the adjacent cameras activated, the composite score breakdown, and outcome controls for the operator to acknowledge and resolve.

HOW I BUILT IT

─────────────────────────────────────────────────────────────────────────────

The system is built on three Elasticsearch indexes with distinct structural roles. The safety_events index is a geo-point time-series store — every document has a location field mapped as geo_point, enabling native geospatial queries without a separate service. The alert_log index carries both live alert records and ninety days of pre-seeded historical data flagged with is_historical: true, giving the historical baseline query statistical density from day one. The camera_registry index is a static geographic registry storing each camera's coordinates, zone metadata, patrol waypoints, and the coordinates of the nearest police station — a single geo lookup replacing what would otherwise be a relational join across services.

The Agent Builder tool chain is where Elasticsearch's query primitives do the analytical work directly. The historical baseline tool queries alert_log with a COUNT_IF aggregation to compute escalation_rate = incident_count / total_alerts per zone — a real statistical variance measure across 180,000 seeded documents. The adjacent camera tool issues a single query: ST_DISTANCE(location, TO_GEOPOINT(?wkt)) <= 500, sorted by distance, limited to five. No Haversine library. No separate geo-service. One query clause against a geo_point field. The composite risk tool runs an ES|QL EVAL chain entirely inside Elasticsearch:

final_score = risk_score
  × zone_risk_multiplier      ← from camera_registry zone profile
  × surrounding_ratio_penalty ← computed from male/female counts
  × historical_amplifier      ← derived from Tool 1 escalation_rate
  × night_hour_penalty        ← applied when hour_of_day ∈ [20, 6]
| EVAL threat_level = CASE(
    final_score >= 88, "CRITICAL",
    final_score >= 70, "HIGH",
    final_score >= 50, "MEDIUM",
    "LOW"
  )

The intelligence is inside Elasticsearch. The Python backend receives a typed result, not a number it has to reason about.

Elastic Watcher is configured with a Painless transform that prevents burst: when multiple cameras in a window exceed the threshold, the transform extracts the single highest-scoring hit and posts one payload to the webhook. This is not a filter — it is a deliberate architectural choice to prevent the Agent Builder from receiving simultaneous invocations for the same event window. Watcher's job is to detect and hand off exactly once per meaningful window.

The Python FastAPI backend is typed end-to-end with Pydantic models that match Zod schemas on the Svelte 5 frontend. The response_parser.py module handles Kibana's actual /converse response shape, which required a custom normalisation layer to resolve. The backend also includes a synthetic event generator with three modes — normal, anomaly, and SOS — that produce calibrated safety_events documents for demo purposes, with score ranges tuned to each mode to consistently trigger the correct agent paths.

The agent setup is fully automated: setup_runner.py registers all three ES|QL tools and the agent against Kibana's API idempotently, and setup_watcher.py registers the Elastic Watcher definition. The entire backend can be reproduced from python -m agent.setup_runner followed by python -m scripts.setup_watcher.

CHALLENGES I RAN INTO

─────────────────────────────────────────────────────────────────────────────

The deepest integration challenge was Kibana's actual /converse response shape. The documented expectation was a flat array of typed step objects. The production response was structurally different: tool call steps embedded their results as a nested results[] array inside the same object, rather than appearing as separate tool_result entries in the step sequence. The frontend's trace panel replay depended on receiving discrete, typed steps — tool_call, tool_result, reasoning, decision, dispatch — in order. A flat pass-through of the Kibana steps array would have produced a broken trace animation.

The fix was a custom normalisation layer in response_parser.py: map_kibana_steps_to_safetrace_trace_steps() iterates the raw Kibana steps, detects tool_call objects with a populated results[] array, and emits them as two sequential steps — a tool_call step followed by a synthesised tool_result step extracted from results[0].data. The decision and dispatch steps, which Kibana does not emit at all, are synthesised from the agent's JSON summary. This was not a documentation gap I worked around — it was a production API behaviour I had to understand, instrument, and solve. It taught me to treat Elastic's APIs as production systems with real response complexity, not demo endpoints.

The second challenge was the historical_amplifier formula design. The amplifier needed to produce meaningful score differentiation between zones with high and low escalation histories without making low-baseline zones structurally incapable of reaching dispatch threshold. The formula I settled on uses 1.0 + (escalation_rate × 0.8) as the amplifier multiplier, which means a zone with zero historical escalation applies a neutral 1.0× multiplier while a zone with 40% escalation history applies 1.32×. A raw score of 72 in a low-baseline zone produces a composite of approximately 72. The same score in a high-baseline zone produces approximately 95. The formula is not stress-tested across every edge case at this stage — what is confirmed is that it avoids the two failure modes: inflating low-signal zones to dispatch, and suppressing valid signals in historically quiet zones.

The third challenge was the Watcher-to-Agent-Builder invocation chain. Getting Watcher's webhook action to correctly authenticate against SafeTrace's backend and having the backend correctly pass the Painless-rendered Mustache payload to the Agent Builder /converse endpoint required coordinating three authentication contexts simultaneously: the Elasticsearch API key for Watcher, the backend's own X-API-Key middleware, and the Kibana API key for Agent Builder. Each context is correct in isolation. Making them compose correctly end-to-end — with the Watcher token exempt from the API key middleware while the Kibana key is injected by the agent invoker — required explicit architectural decisions about which authentication belongs at which layer.

ACCOMPLISHMENTS I'M PROUD OF

─────────────────────────────────────────────────────────────────────────────

The accomplishment I am most confident in is the composite risk formula that makes ninety days of historical zone data change real dispatch outcomes. A rule engine fires on risk_score > 70. SafeTrace fires differently on risk_score = 72 depending on the zone's measured escalation history. These are not the same system. One responds to a threshold. The other responds to context. The distinction is provable with specific numbers: the same raw signal produces a composite of 72 in a low-baseline zone and 95 in a high-baseline zone. The dispatch decision is different. The written justification is different. The zone history that produced the difference lives in Elasticsearch and is queryable. That is not a feature — it is the entire architectural argument for building this on Elasticsearch rather than a rules platform.

I am also proud of how ST_DISTANCE() resolved what would otherwise have been a multi-service problem. Correlating adjacent cameras within five hundred metres as corroborating evidence is a production-grade geospatial operation. Without native geo support, this would require a separate service, a Haversine computation layer, and a consistency boundary between them. The query is one clause. The result is a typed list. The architectural surface area of the entire SafeTrace system is smaller because Elasticsearch treats geo as a first-class query primitive — and that simplification is only available because I built on Elastic's stack.

The Watcher-as-orchestrator separation is an accomplishment I could not have articulated before this project. Watcher watches data. Agent Builder reasons about data. Making that separation explicit — encoding it as a system boundary rather than a convention — is what makes SafeTrace maintainable. Every patrol dispatch carries a written justification traceable to specific Elasticsearch queries. That is auditable by design, not by accident. For a law enforcement application, that distinction is the difference between a tool that can be deployed and one that cannot.

WHAT I LEARNED

─────────────────────────────────────────────────────────────────────────────

SafeTrace taught me things about Elasticsearch that I could not have learned from any other project, because the specific combination of real-time time-series data, geospatial correlation, and historical intelligence amplification does not appear in tutorials or documentation examples.

The most significant learning was ES|QL as a real-time intelligence layer. I had used ES|QL for aggregations before. SafeTrace was the first time I used it for multi-dimensional sliding window queries — gender ratio cross-referenced with time-of-day, zone type, and location in a single statement. The EVAL chain in Tool 3 — multiplying four factors inside Elasticsearch and receiving a typed threat_level as output — changed how I think about where computation belongs in a real-time system. Computation that lives inside Elasticsearch travels with the data. It does not cross a network boundary. It does not depend on application state. For a real-time safety system where latency between signal and dispatch matters, keeping the scoring logic inside the data layer is not an optimisation — it is a correctness requirement.

Elastic's native geo capabilities changed my architecture instincts. Before SafeTrace, I would have reached for a separate geo-service for any production-grade geospatial operation. ST_DISTANCE() inside an agent tool replaced that service entirely. The lesson is not that ST_DISTANCE() exists — it is that Elasticsearch's geo support is mature enough to be the sole geospatial layer in a system that needs to make dispatch decisions under time pressure. I now think about Elasticsearch's field types — geo_point, semantic_text, date — as architectural decisions, not just storage choices.

Elastic Watcher taught me a mental model I will carry into every event-driven system I build. Watcher is not a monitoring tool. It is an event-driven trigger layer with temporal awareness built in. The correct mental model is: Watcher watches data, Agent reasons about data, and the boundary between them is a deliberate architectural constraint. Collapsing both into the agent — having the agent poll its own index on a timer — would make both functions harder to tune, debug, and audit independently. This hackathon taught me that separation of concerns in AI systems is not just a software engineering principle — it is what makes those systems explainable to the people who depend on them.

The Kibana /converse API depth was the most technically honest learning moment. Production APIs have response shapes that differ from documented expectations. steps[].results[] embedded inside tool_call objects was not in any example I found before hitting it. Writing response_parser.py to handle the actual response shape — not the expected one — taught me to build against production API behaviour from the start, not against documented examples. That habit will make every Elastic integration I build more robust.

WHAT'S NEXT FOR SAFETRACE

─────────────────────────────────────────────────────────────────────────────

SafeTrace is architecturally complete. The intelligence layer — ingestion, detection, historical amplification, geospatial correlation, composite scoring, Agent Builder reasoning, patrol dispatch, audit trail — is built, typed, tested, and deployable. What is not built is the vision layer: real-time gender classification, lone-person detection, and SOS gesture recognition from a live camera feed. That is a computer vision pipeline. It is not Elasticsearch's responsibility, and it is not SafeTrace's responsibility. SafeTrace is the intelligence layer that consumes structured signals from that pipeline. The boundary is correct. Swapping a synthetic event generator for a real vision inference output requires no changes to Elasticsearch, the agent, or the dispatch logic.

The next step is integration conversations. I intend to reach out to Delhi Police, NDMC, and state government smart city initiatives to understand where SafeTrace fits within their existing CCTV infrastructure and what an integration path looks like practically. This is not a pitch — it is a technical conversation about whether the vision layer can be connected, what data governance constraints apply, and what a pilot deployment would require.

Long-term, I want to expand coverage beyond Delhi-NCR to other Indian cities with existing smart city infrastructure, integrate real-time PCR van GPS tracking into Elasticsearch for precise patrol ETA calculations, and build a multilingual alert interface for control room operators who work in Hindi and regional languages rather than English.

I will maintain this project because the problem does not stop. Incidents emerge. Control rooms respond after the fact. The gap between signal and response is not a technical unsolved problem — the technical solution exists and is running. The remaining work is institutional. SafeTrace exists to close that gap, and I intend to continue closing it for as long as the gap remains.

Built With

Share this project:

Updates