PathGuard

Inspiration

Construction sites are dynamic, cluttered, and unpredictable. A single misplaced hose or tool can cause a serious fall in seconds. We kept coming back to one simple question:

Workers don’t need to know everything in the scene — they need one answer: Is something in my path right now?*

At the same time, site managers struggle with documentation, compliance, and understanding what’s actually happening on-site. We saw an opportunity to combine real-time spatial safety intelligence with automated video labeling into one unified system.

That idea became PathGuard.

What it does

PathGuard is a two-path system:

1. Real-Time Hazard Detection (PathGuard HUD)

It defines a walking corridor in front of the worker and detects obstacles and trip risks directly in that path. Instead of detecting everything in the frame, it focuses only on what matters spatially.

We compute a corridor occupancy score:

$$ \text{Occupancy Score} = \frac{\text{Obstacle Pixels in Corridor}}{\text{Total Corridor Area}} $$

Alerts are triggered only when occupancy exceeds a threshold for consecutive frames, reducing noise and false positives.

2. Intelligent Video Labeling System

PathGuard also analyzes footage to generate structured scene descriptions and safety telemetry. The system extracts relevant objects, activities, and hazards from video and converts them into structured JSON outputs.

These labels:

Adapt detection prompts dynamically to each site
Support safety reporting and compliance documentation
Provide structured data for workflow and productivity insights

Together, the two paths allow PathGuard to both react instantly and understand context over time.

How we built it

We designed PathGuard as a layered system that degrades gracefully.

Corridor-First Spatial Reasoning

Instead of global object detection, we define a trapezoidal walking corridor and constrain all hazard reasoning to that region.

Multi-Layer Detection Pipeline

Classical CV (Canny + morphology) runs every frame.
Zero-shot detection adds semantic labeling.
Monocular depth estimation provides relative urgency (NEAR / MID / FAR).
A persistence-based state machine prevents flickering alerts.

The hazard state follows a persistence rule:

$$ \text{Alert if } \sum_{i=1}^{k} \mathbb{1}(\text{occupancy}_i > \tau) \geq p $$

Where:

τ = occupancy threshold
p = required persistence frames
1(·) = indicator function

Parallel to this, the video labeling pipeline uses a vision-language model to:

Generate scene descriptions
Extract structured safety telemetry
Produce dynamic object prompts

The two paths integrate through shared prompts and enriched telemetry.

Challenges we ran into

1. Token and Context Limits

Generating structured safety telemetry from long transcripts quickly hit model token limits. We had to:

Chunk transcripts into smaller segments
Reduce prompt verbosity
Constrain output schema size
Implement JSON repair for malformed outputs

Balancing structured output quality with token limits was one of our biggest challenges.

2. False Positives from Motion Blur

Bodycam footage is unstable. Motion blur generated phantom edges that triggered false trip-risk alarms.

We introduced a blur quality gate using Laplacian variance:

$$ \text{Blur Score} = \text{Var}(\nabla^2 I) $$

Frames below a threshold were suppressed from triggering alerts.

3. Stability vs. Responsiveness

Simple threshold-based alerts caused flickering behavior. We implemented a persistence-based state machine that is slow to alarm but fast to clear, dramatically improving usability.

Accomplishments that we're proud of

We’re especially proud that we successfully integrated a two-path system - real-time hazard detection and intelligent video labeling working hand in hand within a short hackathon timeframe.

Both paths: