Helion -- Geospatial Video Intelligence

Inspiration

I spent 14 years in law enforcement across St. Louis PD, Berkeley PD, and the Army National Guard Military Police. I have sat in the room. I have been the investigator scrubbing through body cam footage at 2x speed for eight hours straight, hunting for four seconds of a suspect walking through the background of a traffic stop three blocks from the scene. I have watched the same witness interview six times trying to catch a detail I knew was there but could not isolate. I have manually cross-referenced five officers' feeds against a 50-page use-of-force policy, clause by clause, while Internal Affairs waited on my timeline.

That process does not take hours. It takes days. Sometimes weeks. And the quality of the output depends entirely on how fresh the investigator is at hour six versus hour one, whether they caught the relevant three seconds buried deep in a peripheral officer's feed, and whether they had the bandwidth to mentally stitch together a coherent spatial picture from cameras that never sync cleanly.

When an officer-involved shooting happens, an entire department is effectively paralyzed. A single incident can produce seven body cams, a dashcam, witness phone footage, dispatch audio, and a department policy manual that someone has to grade clause by clause against the evidence. Internal Affairs spends days just correlating before the actual investigation can begin.

Track 3's charter says it directly: "Intelligence analysts spend hours manually correlating data from different sources. Multimodal systems could surface connections in minutes." I did not read that as a hypothetical. I read it as a description of my career. Helion was built to fuse video, transcripts, geospatial movement, and structured policy text into a single grounded investigative surface so that the time-to-insight collapses from days to seconds.


What it does

Helion is a multi-source intelligence fusion system for video-evidence-heavy investigations. It ingests video alongside structured policy, geospatial overlays, and unstructured text (auto-extracted transcripts), then synthesizes them into investigative answers grounded in every modality.

A single uploaded incident produces:

  • Synchronized multi-angle viewer -- up to seven feeds aligned on a single incident clock. Scrub once, see every camera at the same moment. No more mentally reconstructing who was where from isolated playback windows.
  • Auto-reconstructed timeline -- Pegasus extracts every notable event (arrival, contact, pursuit, force, de-escalation) with timestamps and confidence scores. The kind of chronological reconstruction that used to take me an entire shift now renders in under two minutes.
  • Geospatial activity map -- officer movement, suspect path, and scene cordon plotted on Mapbox with road-snapped pursuit corridors. This is the spatial picture I used to build on a whiteboard with dry-erase markers and hand-drawn arrows. Now it builds itself.
  • Verbatim transcripts -- every spoken word with speaker labels, categorized into commands, weapon mentions, medical calls, Miranda warnings, and dispatch updates. Every investigator has rewound the same five seconds of muffled audio dozens of times trying to determine if something was said. Helion pulls it clean and labels it.
  • Helion Agent -- natural-language Q&A grounded in the footage. Every answer cites the feed name and mm:ss timestamp. Ask "Did any officer issue a verbal warning before the first use of force?" and get a sourced, timestamped answer instead of spending an hour finding it manually.
  • Policy compliance review -- each clause of department use-of-force policy graded against the actual evidence, with the option to re-evaluate any clause live with Pegasus. This is the feature that turns a weeks-long Internal Affairs review into a structured, auditable assessment a supervisor can act on the same day.
  • One-click reports -- accountability report, citizen summary, investigative use-of-force memo, and the master case file, all composed deterministically from the structured evidence. No hallucinated narratives. Every sentence traces back to a source.

The hero demo is a real, public-domain Houston Police Department officer-involved shooting (600 W Mt Houston Rd, 9/10/2022) reconstructed from seven actual body-cam and dashcam feeds: 63 timeline events, 33 utterances, 91 Marengo embeddings, and a 9-clause policy review, all extracted automatically.


How we built it

The product is a Next.js 16 / React 19 console deployed live at helion.metisos.co. The core pipeline runs entirely on AWS:

  • TwelveLabs Pegasus 1.2 on AWS Bedrock for video understanding -- synchronous InvokeModel calls per feed, with responseSchema enforced for structured timeline and transcript extraction
  • TwelveLabs Marengo 3.0 on AWS Bedrock for multimodal embeddings -- async StartAsyncInvoke for asset-level and clip-level vectors used in cross-feed retrieval
  • AWS S3 for video storage with presigned PUT/GET so footage uploads direct from the browser and plays back without ever touching our origin
  • Mapbox GL JS for tiles, Directions API for road-snapping pursuit corridors, and Geocoding API for new-case addresses

Every interactive surface (viewer, map, agent rail, policy grading, report canvas) is a server-component shell with a client-component sibling. The wizard's process orchestrator fans Pegasus calls out per video in parallel, then merges results back into a single CaseRecord. Reports are composed server-side from pure functions with zero LLM hallucination risk on the deliverable.

Deployment is self-hosted with a HMAC-verified Python webhook that pulls from GitHub, runs npm ci && npm build, and gracefully swaps the systemd-managed Next.js process behind nginx + Let's Encrypt. From git push to live in production: about 30 seconds.

This architecture reflects a principle I learned building MetisOS and the Railroad Memory System at Metis Analytics: reserve the model for what only the model can do, and make everything downstream deterministic. Pegasus extracts. Marengo embeds. Pure functions compose. The investigator trusts the output because every claim is grounded, every report is traceable, and no step in the deliverable pipeline is stochastic.


Challenges we ran into

Pegasus event timestamps drift. On noisy body cam audio (and body cam audio is almost always noisy, I can tell you from years of wearing one), Pegasus mistimes notable events by 5-15 seconds. We had to hand-calibrate incidentStartSec offsets per feed so the multi-angle sync reads correctly during playback. Dashcam first, Officer Ready 19 seconds later, others as they arrive. The lesson: do not trust raw model timestamps for narrative reconstruction. Calibrate per source.

Marengo's async-only API. Marengo embeddings are powerful but only available via async invoke on Bedrock, roughly a 30-second round-trip. That is fatal for interactive Q&A. We bypassed Marengo for the agent's runtime path and use transcript-hit plus keyword routing instead. Marengo still does the heavy lifting offline for cross-feed retrieval. This mirrors what I learned running continual learning experiments at Metis Analytics: the right architecture separates offline intelligence from real-time responsiveness.

Pegasus 1.5 is not on Bedrock yet. The hackathon brief references 1.5. We ship on 1.2 and documented the limitation in the validation report.

Hydration mismatches on time formatting. Node renders midnight as 24:10; Chromium renders 00:10. We had to pin both timeZone: "America/Chicago" and hourCycle: "h23" everywhere a timestamp appears. Small bug, big consequences when your entire product is built on temporal precision.

Self-host deploy race condition. Our first webhook implementation kept the frontend running while npm build rebuilt .next/. Judges hitting the site mid-build got 500s. Fixed by stopping the systemd unit before build and starting it after.

Reverse-proxy redirect bug. new URL("/overview", req.url) used the upstream container URL behind nginx, sending users to 127.0.0.1:4288 after opening a case. Fixed by reading X-Forwarded-Host and X-Forwarded-Proto in the redirect handler.


Accomplishments that we're proud of

It is actually live. helion.metisos.co is a working, push-to-deploy production deployment. Not a slide deck. Not a recorded demo. Judges can upload their own footage right now and watch Pegasus process it in real time.

38x faster than manual review. A 7-camera incident with 21 minutes of footage reconstructs in 90 seconds. I know exactly what the manual version of that looks like because I have done it hundreds of times. Internal Affairs takes 4-6 hours for the same scope. Helion does it before the coffee is ready.

Every answer is grounded. The Helion Agent does not summarize. It cites. Every claim carries the source feed name and a clickable mm:ss timestamp that jumps the viewer to the moment the camera captured it. This was a non-negotiable design decision informed by how investigations actually work: if an investigator cannot verify a claim against the source material, the claim is worthless. I would never have trusted a tool that told me "the suspect was seen heading east" without showing me exactly where in the footage that came from. Neither will any other investigator.

Live policy re-grading. The "Re-evaluate live" button reruns Pegasus against the footage for any single policy clause and updates the rating in place. This is the multimodal fusion story working end-to-end in the UI. A supervisor can challenge any finding, re-run the analysis, and see if the assessment holds. That is the kind of auditability that departments need for community trust and legal defensibility.

Real-incident demo data. The Houston OIS dataset is published, public-domain footage. Anyone can verify the system is not fabricating outputs because the source material is independently watchable. We chose this deliberately. If you are building a tool for accountability, the tool itself has to be accountable.

One-click sample-video flow for judges. Step 2 of the new-incident wizard offers a pre-uploaded sample so judges can demo the fresh-incident path in a single click without bringing their own footage.


What we learned

Pegasus is a structured-extraction tool, not a chatbot. When you pass responseSchema to invokePegasus(), Bedrock honors it and JSON parsing becomes deterministic. Free-form Pegasus prompts are unreliable. Schema-constrained ones are production-grade. This tracks with everything I have learned building agent systems at Metis Analytics: structured output contracts are the difference between a demo and a deployable product.

Determinism wins for deliverables. Reports composed by pure TypeScript functions over the structured CaseRecord are faster, cheaper, and more trustworthy than LLM-generated reports. Reserve the model budget for the things only the model can do: video grounding, transcript extraction, semantic search. Everything downstream should be deterministic. An investigator who catches a single hallucinated detail in a report will never trust the tool again.

Investigators do not trust answers. They trust citations. The single highest-impact UX decision was making every agent answer carry a clickable feed plus timestamp. Without that, the tool is a curiosity. With it, the tool is evidence. I know this because I have been that investigator, and I would have dismissed any tool that asked me to trust its summary without showing me the tape.

Multi-rail console layouts hold up better than tabs when the user needs the same context (agent, inspector) available on every page. Once we built the persistent agent rail, every other screen got simpler.

Async embeddings are not interactive embeddings. Marengo is excellent at what it does, but a 30-second round-trip is incompatible with Q&A. Cache aggressively, embed offline, and use cheap routing (transcript hits, keyword matching) for the runtime path.


What's next for Helion

Pegasus 1.5 on Bedrock the day it ships. Drop-in model swap. Expect tighter event localization and better long-form transcripts.

PDF policy ingestion. Today the HPD GO 600-17 grading is hand-seeded JSON. Next is reading the source PDF directly so any department can drop in its own policy book and get immediate compliance analysis without manual configuration.

Satellite imagery as a first-class layer. Currently a Mapbox style toggle. The roadmap is a fused overlay with georeferenced incident moments, bridging the gap between street-level body cam intelligence and overhead geospatial context.

Persistent multi-tenant case store. The hackathon MVP runs an in-memory case store on globalThis. Production needs Postgres plus S3-backed evidence retention and per-department RBAC with CJIS-compliant access controls.

More modalities. Drone overhead, fixed traffic cameras, and witness phone video are all already supported by the ingest pipeline. We want to package starter datasets for each so analysts beyond law enforcement (search-and-rescue teams, intelligence community field analysts, investigative journalists) can adopt the same fusion model.

PatrolWatch integration. Helion handles the field. PatrolWatch, our law enforcement academy training platform already deployed at Lincoln University and Jefferson College, handles the classroom. Together they create a training-to-field intelligence continuity pipeline: cadets learn proper procedure in the academy, and Helion evaluates whether those procedures are followed in the field.

Productize the grounding pattern. Every claim cites a source. That pattern (structured extraction, deterministic composition, citation-linked delivery) is reusable for any analyst tool that fuses video with structured data, and we intend to make it a core capability of the Metis Analytics platform.

Built With

Share this project:

Updates