SIQUR

Technical Memo

Inspiration

Modern surveillance systems share one goal: detect and respond to threats before they escalate. Yet, today’s solutions split into two fundamentally flawed approaches.

First, AI-driven surveillance offers speed and scale. Major cities have invested $25 billion into automated systems, but these models are inherently brittle. Relying on static, biased datasets and limited camera perspectives, they frequently misclassify behavior, fail to generalize, and erode public trust.

Second, traditional security teams provide the context and intuition that AI lacks. However, they are fundamentally constrained by scale. Human investigations are manual, slow, and fragmented across parallel camera feeds, resulting in missed or delayed critical signals.

The industry is stuck in a tradeoff: speed without understanding, or understanding without speed.

We built Siqur to eliminate this tradeoff.

Siqur introduces a Digital Twin layer that reconstructs environments in 3D. By generating synthetic scenarios, we train models across diverse conditions, perspectives, and populations. This enables the system to learn from the exact viewpoint of every camera, entirely bypassing biased historical data.

By grounding AI in spatial-temporal context, Siqur turns fragmented feeds into a unified engine that interprets, explains, and acts in real time.

The result: Fast, context-aware AI without the blind spots of existing approaches.

What it does

Our person path tracking tool: Watchman Watchman, transforms fragmented security infrastructure into a unified, proactive intelligence engine, replacing manual monitoring and biased automation with a streamlined pipeline: Reconstruct, Simulate, and Deploy.
Instant Digital Twin Generation We digitize your environment instantly. Our world model engine pulls Google Maps data to reconstruct building exteriors into 500,000 gaussian splats. For interiors, users generate high-fidelity 3D environments in minutes using LiDAR scans.
Algorithmic Coverage & Synthetic Training Our Camera Placement Solver calculates optimal configurations to maximize coverage and minimize hardware costs. From these virtual vantage points, Siqur generates thousands of synthetic scenarios to train on nuanced behavioral triggers (like body language) rather than biased demographic profiles. Users can even create custom scenarios to stress-test the model's predictive accuracy before deployment.
Natural Language Search: Security teams can query footage effortlessly (e.g., "Show me anyone entering the north exit with a package"), eliminating hours of manual scrubbing.
Proactive Intervention: Upon threat detection, Siqur maps the perpetrator's real-time path and triggers instant alerts, equipping responders with actionable context to intercept situations before they escalate.

The Sique Advantage: We move security from reaction to prediction, turning every camera into an expert observer operating with human intuition at AI scale.

How we built it

Two-Stage Camera Placement Engine (Python)

We discretize floor plans and cast 360 rays per candidate to compute visibility polygons with line-of-sight constraints. A 90° FOV sweep determines optimal yaw.

Phase 1: Fast greedy algorithm yields a log-optimal approximation. Phase 2: For <50 cameras, a Google OR-Tools CP-SAT ILP solver runs 4 parallel workers to find the provably minimal layout.

GPU-Accelerated Video Generation

An isolated worker loads the Wan I2V 14B diffusion model into GPU memory. WebSockets stream per-step previews. On Nvidia DGX Hopper, we use: • VAE tiling to avoid OOM • TF32 matmul (~3× faster) • PyTorch Flash SDPA attention

Frames pipe to FFmpeg and encode to H.264 MP4 (faststart) for instant playback.

World Model → Gaussian Splat Generation

We implemented a 3D world model from multi-view imagery. Outputting a Gaussian splat field—millions of anisotropic 3D Gaussians optimized for photorealistic reconstruction.

The digital twin enables real-time rendering and simulation.

Immersive Next.js Frontend

The UI transitions via framer-motion through: • Mapbox Globe: Select buildings • 2D Optimizer: Input budgets, watch real-time placement • 3D Gaussian Splatting: Powered by mkkellogg/GaussianSplats3D

High-Velocity FastAPI Backend

Built for hackathon speed, we use in-memory sessions. Decoupling FastAPI from the GPU worker via WebSockets keeps the API responsive during heavy diffusion workloads.

Watchman: Agentic Cross-Camera Motion Reconstruction

Watchman bridges blind spots across camera feeds. When a subject disappears and reappears, it correlates identity using temporal and spatial priors, reconstructing continuous trajectories through unseen space.

The system ingests synchronized feeds with learned offsets, runs per-frame detection via SAM2, and projects detections into 3D coordinates. A spline-based agentic solver interpolates smooth, physically plausible motion (walking, pausing, sprinting).

Multiple subjects are tracked simultaneously with color-coded trajectories rendered as Line2 progressive animations over the 3D interior.

What we learned

The Power of Hybrid Optimization Combining classical algorithms with modern generative AI is incredibly effective. We learned to orchestrate a two-phase camera placement engine: pairing a rapid, log-optimal greedy set-cover algorithm with a Google OR-Tools CP-SAT ILP solver for exactness.
2D Proxies for 3D Spatial Reasoning We discovered that 2D ray-casting is a highly accurate, performant proxy for 3D camera coverage. By casting 360 rays per candidate position to compute visibility polygons, then applying a sliding FOV window to optimize yaw, we achieved precise spatial reasoning without the overhead of a full 3D simulation.
Productionizing Large Diffusion Models Running a massive model like Wan I2V in production requires aggressive memory management. We learned the practical necessity of implementing VAE tiling, TF32 matmul, and Flash SDPA. Furthermore, architecturally isolating the model in a dedicated GPU worker—communicating via WebSockets—was critical to keeping our FastAPI web server responsive and restartable during intensive diffusion cycles.

What's next for SIQUR

Automated Photogrammetry for Global Scale Our highest-value next step is expanding beyond our initial Purdue environment by integrating photogrammetry workflows. By automatically sourcing images to generate interior and exterior 3D splats, we can deploy Siqur for any facility worldwide—shrinking enterprise onboarding time from days to mere hours per building.

Built With

cesiumjs
fastapi
framer
gradio
next.js
onnx
opencv
or-tools
pytorch
qwen
react
sam
tailwindcss
three.js
transformers
uvicorn
vlm
yolo

Submitted to

Catapult 2026
- Winner Most Promising Startup

Created by

- Built the full frontend: interactive 3D globe, building selection flow, camera viewer with Gaussian splatting, and camera switcher UI
- Designed the visual identity: dark military HUD theme, animated boot sequence, glitch effects, and smooth view transitions, as well as spinning ASCII hero page

Rohan Muppa
My main contributions were with the backend. I developed the camera placement algorithm and the watchman feature. I was also an actor in a real-life simulation.

David Wang
Mocked the idea for Ray Tracing and timeline reconstruction by developing a mini app to showcase the idea of simulating an attacker path which was eventually used for the Ray Cast simulation mapping out real interaction between people in the 3D enviornment.

Ritvik Indupuri
- Built world model pipeline for exterior and interior reconstruction of buildings
- Developed a cross-camera motion reconstruction engine that correlates fragmented security feeds into continuous, time-synchronized 3D trajectories

david chen
Mihir Chauhan
But it works on my computer..