PathGuard

Inspiration

Construction sites are dynamic, cluttered, and unpredictable. A single misplaced hose or tool can cause a serious fall in seconds. We kept coming back to one simple question:

Workers don’t need to know everything in the scene — they need one answer: Is something in my path right now?*

At the same time, site managers struggle with documentation, compliance, and understanding what’s actually happening on-site. We saw an opportunity to combine real-time spatial safety intelligence with automated video labeling into one unified system.

That idea became PathGuard.


What it does

PathGuard is a two-path system:

1. Real-Time Hazard Detection (PathGuard HUD)

It defines a walking corridor in front of the worker and detects obstacles and trip risks directly in that path. Instead of detecting everything in the frame, it focuses only on what matters spatially.

We compute a corridor occupancy score:

$$ \text{Occupancy Score} = \frac{\text{Obstacle Pixels in Corridor}}{\text{Total Corridor Area}} $$

Alerts are triggered only when occupancy exceeds a threshold for consecutive frames, reducing noise and false positives.


2. Intelligent Video Labeling System

PathGuard also analyzes footage to generate structured scene descriptions and safety telemetry. The system extracts relevant objects, activities, and hazards from video and converts them into structured JSON outputs.

These labels:

  • Adapt detection prompts dynamically to each site
  • Support safety reporting and compliance documentation
  • Provide structured data for workflow and productivity insights

Together, the two paths allow PathGuard to both react instantly and understand context over time.


How we built it

We designed PathGuard as a layered system that degrades gracefully.

Corridor-First Spatial Reasoning

Instead of global object detection, we define a trapezoidal walking corridor and constrain all hazard reasoning to that region.

Multi-Layer Detection Pipeline

  • Classical CV (Canny + morphology) runs every frame.
  • Zero-shot detection adds semantic labeling.
  • Monocular depth estimation provides relative urgency (NEAR / MID / FAR).
  • A persistence-based state machine prevents flickering alerts.

The hazard state follows a persistence rule:

$$ \text{Alert if } \sum_{i=1}^{k} \mathbb{1}(\text{occupancy}_i > \tau) \geq p $$

Where:

τ = occupancy threshold
p = required persistence frames
1(·) = indicator function

Parallel to this, the video labeling pipeline uses a vision-language model to:

  • Generate scene descriptions
  • Extract structured safety telemetry
  • Produce dynamic object prompts

The two paths integrate through shared prompts and enriched telemetry.


Challenges we ran into

1. Token and Context Limits

Generating structured safety telemetry from long transcripts quickly hit model token limits. We had to:

  • Chunk transcripts into smaller segments
  • Reduce prompt verbosity
  • Constrain output schema size
  • Implement JSON repair for malformed outputs

Balancing structured output quality with token limits was one of our biggest challenges.


2. False Positives from Motion Blur

Bodycam footage is unstable. Motion blur generated phantom edges that triggered false trip-risk alarms.

We introduced a blur quality gate using Laplacian variance:

$$ \text{Blur Score} = \text{Var}(\nabla^2 I) $$

Frames below a threshold were suppressed from triggering alerts.


3. Stability vs. Responsiveness

Simple threshold-based alerts caused flickering behavior. We implemented a persistence-based state machine that is slow to alarm but fast to clear, dramatically improving usability.


Accomplishments that we're proud of

We’re especially proud that we successfully integrated a two-path system - real-time hazard detection and intelligent video labeling working hand in hand within a short hackathon timeframe.

Both paths:

  • Run independently
  • Improve each other
  • Remain functional even if advanced AI models fail

Building a resilient, layered safety system under time constraints was a major achievement.


What we learned

  • Spatial reasoning is more important than global detection for safety systems.
  • Classical computer vision remains critical for reliability.
  • Hybrid on-device + cloud architectures are practical when carefully designed.
  • Structured LLM outputs require defensive engineering.
  • Stability and usability matter more than raw model complexity.

Most importantly, we learned that resilience beats pure accuracy in real-world safety applications.


What's next for PathGuard

Our next goal is to measure productivity and safety impact using the generated labels.

We plan to benchmark model-generated labels against human-labeled ground truth.

We will evaluate:

$$ \text{Precision} = \frac{TP}{TP + FP} $$

$$ \text{Recall} = \frac{TP}{TP + FN} $$

We also aim to measure:

  • Time-to-hazard detection
  • Activity distribution over time
  • Workflow bottlenecks derived from labeled events

Ultimately, PathGuard will not only prevent accidents — but also quantify how work gets done safely and efficiently.

Built With

  • android
  • apple-neural-engine
  • cactus-engine
  • depth-anything-v2
  • edge-computing
  • google-gemini-2.5-flash/pro
  • graceful-degradation
  • groundeddino
  • ios
  • lfm2.5-vl-1.6b
  • multi-model-integration
  • opencv
  • pytorch
  • raspberry-pi-5
  • sam2
  • spatial-reasoning
  • state-machine-design
  • streamlit
  • streamlit-webrtc
  • zero-shot-detection
Share this project:

Updates