GuardianView — Your AI Safety Copilot That Never Blinks


Inspiration

Every 104 minutes, a worker dies from a workplace injury in the United States. In 2024 alone, employers reported 2.5 million workplace injuries and illnesses (U.S. Bureau of Labor Statistics), and experts estimate the true toll is between 5.2 and 7.8 million annually when accounting for widespread underreporting (AFL-CIO, Death on the Job, 2025). The economic cost reaches up to $348 billion per year.

The problem isn't ignorance. Workers know the rules. But in the moment, when your hands are full, your focus is locked on the task, and you're working under pressure, hazards slip past unnoticed. Safety training happens once; the dangerous moment happens every day.

As a Mechatronics and Biomedical engineer, I've spent years in workshops, labs, and industrial environments. I've seen firsthand how easily safety lapses occur, even among experienced professionals. The idea behind GuardianView was simple: what if you had an expert safety officer who watched your workspace in real-time, never got tired, never looked away, and could talk to you naturally while you worked?


What It Does

GuardianView is a real-time AI safety copilot powered by Google's Gemini Live API. It watches your workspace through a standard camera, understands the context of what you're doing, and speaks up the instant something is wrong.

Core Capabilities

Real-time hazard detection via vision: Identifies missing PPE (safety glasses, gloves, ear protection), unsafe tool handling, improper posture, spills, and environmental hazards — all through continuous video analysis powered by Gemini's multimodal understanding.

Proactive voice alerts: Unlike a typical chatbot that waits for the user to speak, GuardianView actively monitors the video feed and initiates spoken warnings when it detects danger, even if the user hasn't said a word. The system analyzes every frame (once per second) and speaks up immediately when critical hazards are detected.

Natural, interruptible voice interaction: Powered by Gemini Live API, the agent speaks alerts conversationally and can be interrupted at any time. Ask it if a chemical combination is safe, request a pre-task safety checklist, or tell it to stand by — all hands-free.

Pre-task safety consultations: Before starting any task, ask GuardianView for guidance. It will provide a comprehensive safety checklist, recommend proper PPE, identify potential hazards, and suggest safe procedures tailored to your specific work. Whether you're about to weld, use a table saw, or handle chemicals, GuardianView acts as your safety advisor before the work even begins.

Session-level contextual awareness: Within a session, GuardianView leverages Gemini's built-in session memory to track what's happening over time. It remembers that you already put on your safety glasses, knows you're currently soldering, and can escalate warnings if you ignore initial alerts.

Dramatic visual alerts: Critical hazards trigger an instant red flash overlay across the video feed (0.3 seconds, semi-transparent) paired with a pulsing border animation, creating an unmissable visual warning that complements the voice alert. The coordinated multimodal approach — visual flash, border pulse, and voice warning — ensures you can't miss a critical safety issue even in noisy environments.

Multilingual support: GuardianView speaks safety in 50+ languages. Configure your preferred alert language in the settings (Spanish, French, German, Japanese, Mandarin, Arabic, Hindi, Portuguese, and dozens more), and the agent delivers all spoken warnings, pre-task guidance, and recommendations in that language while maintaining full technical accuracy and OSHA compliance. Language barriers no longer compromise workplace safety.

Real-time incident logging to Firebase: Every detected incident is instantly written to a Cloud Firestore database with full metadata — timestamp, severity classification, hazard description, workspace profile, session ID, and the agent's spoken recommendation. This creates a durable, queryable safety record that persists across sessions and is immediately available for compliance audits, trend analysis, and safety reporting without any manual data entry.

Automated safety manager notifications: The moment a hazard is detected, GuardianView automatically sends a real-time email alert to the configured recipient — whether that's a safety manager, EHS officer, site supervisor, or any responsible party set up in the app's settings. Each email includes the full incident details, severity level, timestamp, and the agent's recommendation. Critical-severity incidents are flagged with a CRITICAL prefix in the subject line to ensure they surface immediately in any inbox or notification system.

Comprehensive safety reports: Generate detailed PDF safety reports for any session with a single click. Reports include session metadata, safety score trends, complete incident timeline with timestamps, all detected hazards with severity classifications, spoken recommendations, and regulatory citations — ready for compliance documentation and safety audits.

Severity-driven prompt behavior: The agent's system instruction defines clear severity tiers (critical, high, medium, low) with specific response rules for each. Critical hazards trigger immediate spoken interruptions and visual flashes; minor observations are mentioned conversationally at natural pauses.

Configurable safety profiles: The agent isn't hardcoded for one environment. Load a kitchen profile and it watches for cross-contamination and burns; switch to an industrial workshop profile and it monitors PPE compliance and tool safety; configure a clinical profile and it tracks sterile field integrity. Adding a new environment is as simple as writing a new profile dictionary — no code changes required.

Regulation-aware through Google Search: The agent has access to Google Search as a tool, allowing it to look up specific OSHA standards, chemical compatibility information, or Material Safety Data Sheets during a live conversation.

Mobile-responsive design: Fully responsive interface optimized for tablets and mobile devices with touch-friendly controls, adaptive layouts, and proper viewport handling for monitoring on the go.


How I Built It

The foundation of GuardianView is Google's Agent Development Kit (ADK) with the Gemini 2.0 Flash model, connected through the Live API for real-time bidirectional audio and video streaming.

Backend Architecture

The backend is a Python FastAPI server that manages WebSocket connections between the browser and the ADK runner. When a user connects, the server creates a LiveRequestQueue and spins up two concurrent async tasks:

  • Upstream task: Receives camera frames (base64 JPEG at 1 frame per second) and microphone audio (PCM16) from the client and forwards them to the Gemini Live session.
  • Downstream task: Listens for agent events (audio responses, text, transcriptions) and streams them back to the browser in real-time.

The WebSocket connection maintains a persistent bidirectional stream, enabling the low-latency interaction required for safety monitoring.

Frontend Implementation

The frontend is a single-page web application that captures the user's camera feed at 1 frame per second, encodes each frame as base64 JPEG, and sends it over the WebSocket alongside PCM16 audio from the microphone.

On the receiving end, it decodes the agent's audio responses from base64 PCM back into Float32 samples and plays them through the Web Audio API for real-time spoken alerts with minimal latency.

Configurable Safety Intelligence

The safety intelligence lives in a configurable profile system. Each profile (workshop, kitchen, clinical) is a Python dictionary that defines:

  • Hazards to monitor
  • Applicable regulations and standards
  • Severity-based response rules
  • Alert language and terminology

These profiles are injected directly into the agent's system instruction at startup, meaning the same agent codebase adapts to completely different environments simply by loading a different profile.

Multilingual Support Implementation

Language support is implemented through dynamic system prompt injection. When a user selects a language from the settings dropdown, the agent's system instruction is updated to include an explicit language rule:

**LANGUAGE RULE:** You MUST speak all safety alerts, warnings, and
recommendations in {selected_language}. Technical terms and OSHA
citations may remain in English, but all conversational speech must
be in {selected_language}.

This approach leverages Gemini's native multilingual capabilities while ensuring safety-critical information remains accurate and grounded in regulations.

Visual Alert System

The dramatic red flash overlay is implemented using CSS animations and JavaScript DOM manipulation:

  • CSS: A ::after pseudo-element on the video container with @keyframes animation (0.3s duration, 0 to 70% opacity for critical alerts, 0 to 50% for high severity).
  • JavaScript: When the backend sends a safety incident event, the frontend adds a flash-critical or flash-high class to the video container to trigger the animation.
  • Coordination: The flash, border pulse, and voice alert fire simultaneously for maximum impact.

Firebase Incident Logging

Every incident detected by the vision analysis pipeline is immediately written to Cloud Firestore in real-time. Each document in the incidents collection captures a comprehensive record:

{
  timestamp:       "2025-06-10T14:32:07Z",   // ISO 8601
  severity:        "critical",                // critical | high | medium | low
  hazard:          "No eye protection during grinding operation",
  profile:         "workshop",
  session_id:      "abc123",
  recommendation:  "Stop immediately. Put on ANSI Z87.1 safety glasses before continuing.",
  regulation:      "OSHA 1910.133"
}

This persistent, structured data store enables longitudinal safety trend analysis, cross-session hazard pattern identification, team-wide safety dashboards for managers, and OSHA-compliant audit trails that survive browser refreshes and session timeouts. The Firestore integration uses the Firebase Admin SDK on the FastAPI backend, keeping credentials server-side and out of the client.

Automated Email Notifications

GuardianView sends a real-time email alert every time a hazard is detected. The recipient address is configured directly in the app's settings panel — it can be a safety manager, EHS officer, site supervisor, or any responsible party appropriate for the deployment context.

Each notification email is dispatched immediately upon incident detection and includes:

  • Full hazard description and severity classification
  • Detection timestamp and active workspace profile
  • The agent's spoken recommendation
  • Relevant regulatory reference (e.g., OSHA 1910.133)

For critical-severity incidents, the email subject line is prefixed with [CRITICAL] to ensure it surfaces immediately. Email delivery is handled server-side via the FastAPI backend using an SMTP integration (or a transactional provider such as SendGrid), keeping delivery reliable and decoupled from browser state — so notifications fire even if the user closes the browser tab mid-session.

Report Generation

Safety reports are generated client-side using jsPDF, eliminating the need for a separate backend reporting service. When the user clicks "Generate Report," JavaScript collects all incident data stored during the session and jsPDF constructs a formatted PDF. Each incident includes timestamp, severity badge (color-coded: red for critical, orange for high, yellow for medium), description, and the agent's spoken recommendation. The PDF is automatically downloaded with a filename containing the session date.

Proactive Monitoring — The Technical Challenge

Making the agent truly proactive was the hardest challenge. The Gemini Live API is designed for conversational turn-taking, where the user speaks and the model responds. But a safety copilot needs to do the opposite: it needs to speak first when it sees danger, even when the user hasn't said anything.

The ADK's ProactivityConfig was not available in the version I was working with, so I engineered a workaround: a lightweight client-side heartbeat that sends a [SAFETY_CHECK] prompt every second alongside continuous video frames. Combined with the proactive_audio flag in the RunConfig and an aggressive system prompt ("DO NOT WAIT", "INTERRUPT IMMEDIATELY"), this transformed GuardianView from a passive assistant into an active watchdog that genuinely interrupts you when danger is detected. Optimizing the check to run on every frame cut detection latency from 2 seconds to 1 second — a critical improvement for time-sensitive hazards.

Grounding and Knowledge

For grounding, the agent has access to Google Search as a built-in ADK tool, allowing it to verify specific OSHA regulations or look up safety information in real-time. Custom function tools handle:

  • Incident logging to Cloud Firestore
  • Real-time email notification dispatch
  • Profile management
  • Session state tracking

Deployment

The entire application is containerized with Docker and deployed to Google Cloud Run using an automated deployment script, making the backend fully managed, scalable, and globally accessible. The stateless design allows multiple concurrent user sessions without resource contention.

Tech Stack Summary

Layer Technology
AI Model Gemini 2.0 Flash (Gemini Live API)
Agent Framework Google Agent Development Kit (ADK)
Backend Python, FastAPI, WebSockets
Database Cloud Firestore (Firebase Admin SDK)
Notifications SMTP / SendGrid — real-time email alerts
Deployment Google Cloud Run (Docker)
Grounding Google Search (ADK built-in tool)
Frontend HTML, CSS, JavaScript, Web Audio API
Report Generation jsPDF (client-side)

Challenges I Faced

1. Making the Agent Truly Proactive

The biggest technical challenge: The Gemini Live API is designed for conversational turn-taking, but a safety copilot must speak first when it sees danger. The ADK's ProactivityConfig wasn't available in the version I was using, so I engineered a workaround: a lightweight client-side heartbeat sending a [SAFETY_CHECK] prompt every second alongside video frames. Combined with the proactive_audio flag and an aggressive system prompt, this turned GuardianView into a genuine active watchdog.

2. Balancing Responsiveness with False Positives

Real-time vision analysis of a busy workspace produces noise. I spent significant time tuning the system prompt and severity classification to reduce unnecessary interruptions while maintaining zero tolerance for genuinely critical hazards. The solution was explicit severity tiers with different behavioral rules: critical hazards interrupt immediately with multimodal alerts; medium and low observations are mentioned conversationally or logged silently.

3. Making Voice Interaction Feel Natural, Not Annoying

A safety agent that constantly talks is one that gets turned off. The biggest design challenge was alert cadence — ensuring GuardianView speaks at the right moment with the right urgency. Too chatty and users ignore it; too quiet and hazards slip through. Severity-driven prompts combined with context awareness struck the right balance, making the agent feel like a helpful colleague rather than an annoying alarm system.

4. Grounding in Real Regulations Without Overwhelming the User

OSHA regulations are dense and technical. The challenge was surfacing the relevant standard when it matters — "That's an OSHA 1910.133 violation, eye protection is required for grinding operations" — without turning every alert into a legal lecture. Prompt engineering ensures regulatory citations enhance rather than dominate the conversation.

5. Multilingual Accuracy for Safety-Critical Information

Implementing multilingual support wasn't just translation — it required ensuring safety-critical terminology remained accurate across 50+ languages. The solution was dynamic prompt injection that explicitly separates technical terms (which may remain in English for precision) from conversational speech (which must be in the user's language), preventing dangerous mistranslations.

6. Designing a Reliable Firebase Write Pipeline

Writing incidents to Firestore in real-time introduced a new failure mode: what if the database write fails mid-session? The solution was a write-ahead approach — the incident is queued locally first, the voice alert and email notification are dispatched immediately, and the Firestore write is retried with exponential backoff if the first attempt fails. This ensures the user-facing safety response is never blocked by database latency, while still guaranteeing durable persistence of every incident.

7. Email Notifications — Reliability vs. Alert Fatigue

Sending an email for every incident creates a double-edged challenge: the pipeline must be reliable enough to never drop a critical alert, but frequent pings risk training safety managers to ignore their inbox. The solution was severity-gated dispatch — [CRITICAL] and [HIGH] incidents trigger immediate individual emails, while medium and low observations can be batched into a periodic digest. The recipient address is fully configurable, so organizations can route alerts to a shared safety inbox, a PagerDuty integration, or a dedicated EHS officer.

8. Coordinating Multimodal Alerts

Synchronizing the visual flash, border pulse, voice alert, Firestore write, and email dispatch required careful event-driven architecture. When the backend detects a critical incident, a single safety_incident message fires over the WebSocket. The frontend immediately triggers both CSS animations and audio playback, while the backend concurrently writes to Firestore and dispatches the notification email — all within the same incident handler, ensuring nothing falls out of sync.

9. Optimizing Detection Speed

Initially, the system sent a safety check every 2 frames (every 2 seconds at 1 FPS). Users reported the detection felt "a bit slow." The solution was straightforward but impactful: run the check on every frame. This cut detection latency in half — from 2 seconds to 1 second — at the cost of doubling API calls. For safety applications, the reduced latency is worth the increased cost.


What I Learned

Gemini's Live API handles real-time multimodal streams remarkably well. The latency between seeing a hazard and producing a spoken alert is fast enough to be genuinely useful in safety applications.

Building proactive AI agents within a conversational framework requires creative engineering. The heartbeat pattern I developed could be a useful technique for anyone building monitoring agents on top of conversational APIs.

The configurable profile approach proved more powerful than expected. Different environments don't just need different hazard lists — they need fundamentally different alert behaviors and severity thresholds. The profile-as-data pattern makes the system highly extensible without code changes.

Durable persistence changes what a safety tool can do. Session-only memory is useful for real-time awareness; Cloud Firestore persistence is what enables longitudinal trend analysis, compliance auditing, and organization-wide safety intelligence. The two layers are complementary, not redundant.

Real-time notifications require thoughtful severity gating. Sending every incident to a safety manager inbox creates noise that defeats the purpose. Gating by severity and batching low-priority observations preserves the signal-to-noise ratio that makes critical alerts actionable.

Voice UX for safety applications is a distinct design problem. Unlike chatbots where the user initiates, a safety copilot must initiate intelligently. Context-aware severity escalation is critical — the agent must distinguish between "I should mention this" and "I must interrupt immediately."

Multimodal alerts are more effective than any single modality. The combination of voice alert, visual flash, border pulse, Firestore log, and email notification ensures critical hazards are captured and communicated regardless of whether the safety manager is on-site, the worker has sound muted, or the session ends unexpectedly. Redundancy in safety systems is a feature, not a bug.

Optimizing for the right metrics matters. For safety applications, detection latency is the critical metric, not cost per session. Halving the detection time meaningfully improves the system's effectiveness, and the increased API cost is justified.


What's Next

  • Advanced notification routing: Extend the email system to support webhooks, Slack channels, SMS via Twilio, and PagerDuty integrations — configurable per severity tier so each organization can route alerts to its existing incident response workflow.
  • Externalized profiles: Move safety profiles from in-code dictionaries to Cloud Storage JSON files, enabling dynamic profile loading and user-created custom profiles without code deployments.
  • OSHA knowledge base: Index OSHA regulations in Vertex AI Search for more precise, grounded safety citations without relying on general web search.
  • Team dashboards: Leverage the Firebase incident data to build real-time safety dashboards for managers — fleet monitoring, trend analysis, and team-wide safety score tracking powered by the persistent Firestore log.
  • Wearable integration: Support for smart glasses (Google Glass Enterprise, RealWear, Vuzix) and body-worn cameras for true hands-free operation in industrial settings.
  • Multi-camera support: Monitor an entire workshop or factory floor with multiple camera feeds, aggregating hazard detection across all cameras with spatial awareness.
  • Compliance reporting: Auto-generate OSHA-compliant incident reports from Firestore-logged events, including Form 300 export.
  • Incident replay and training: Save video clips of detected incidents for post-session review and safety training materials.
  • IoT sensor integration: Combine camera vision with environmental sensors (gas detectors, temperature monitors, noise level meters) for comprehensive workspace monitoring.
  • Custom hazard training: Allow organizations to fine-tune hazard detection on their specific workplace environments and equipment.

Sources

  • U.S. Bureau of Labor Statistics, Census of Fatal Occupational Injuries 2024 — bls.gov/iif
  • U.S. Bureau of Labor Statistics, Survey of Occupational Injuries and Illnesses 2024 — bls.gov/news.release/osh.htm
  • AFL-CIO, Death on the Job: The Toll of Neglect 2025 — aflcio.org/reports/dotj-2025
  • OSHA — 29 CFR 1910 (General Industry), 29 CFR 1926 (Construction)
  • Google Agent Development Kit (ADK) Documentation
  • Gemini Live API Documentation
  • Firebase / Cloud Firestore Documentation — firebase.google.com/docs/firestore

Built With

Share this project:

Updates