Inspiration

I often spend hours monitoring Machine Learning experiments. Its truly painful sometimes. I created labguard to help me spend my time more productively while still monitoring my experiments

What it does

LabGuard is an AI-assisted desktop monitor for long-running training and lab workflows.

It helps you replace manual terminal watching by combining:

  • live run telemetry
  • AI watchdog analysis
  • safe callable remediation actions
  • alerting (SMTP email)
  • optional multi-source monitoring (terminal + video)

Purpose

Training and experiment runs often fail silently, drift, or degrade while unattended. LabGuard is built to:

  • detect issues early from logs and metrics
  • suggest or execute safe interventions
  • notify you when critical changes happen
  • keep a clear timeline of run behavior and decisions

Core Features

  • Run Management

    • Launch commands directly from the app
    • Attach to already-running output streams via labguard_tail.py
    • Restart finished runs from sidebar
    • Delete runs (including currently running runs)
  • Live Monitoring

    • Monitoring from live video sources such as webcam or screen capture
    • Real-time terminal stream capture (stdout + stderr)
    • Automatic metric parsing from common training log formats
    • Metrics/terminal/watchdog/action/chat views per run
  • AI Watchdog

    • Periodic analysis loop over recent logs + parsed metrics
    • Action recommendations in structured JSON
    • Approval flow before running suggested actions
  • Action Tools

    • Register safe runtime actions from Python using labguard_sdk.py
    • Run actions from UI or watchdog recommendations
    • Action persistence in UI (online/offline status)
  • Model Providers

    • Local models via Ollama
    • Cloud models via NVIDIA NIM
    • Cloud models via Anthropic Claude
    • Unified model selection across chat/watchdog
  • Alerts

    • SMTP email notifications for:
    • watchdog action suggestions
    • watchdog failures
    • sudden/non-clean training stops
    • Test email from settings
    • Basic styled HTML alert templates
  • Multi-Source / Video Monitoring

    • Supports video-based run monitoring flows
    • Built to analyze more than one stream context (for example, titration video + terminal telemetry)

Included Demo Scripts

  • demo_train_simple.py

    • Lightweight training-style log stream
  • demo_train_monitor.py

    • Longer realistic run with controllable instability and actions
  • demo_monitor_trigger.py

    • Deterministic watchdog trigger scenarios (mixed, nan, oom, crash, plateau)
  • demo_titration_test.py

    • Synthetic titration telemetry for dual-stream monitoring demos

Quick Start

  1. Install dependencies:

    npm install
    
  2. Run desktop app in dev mode:

    npm run dev
    
  3. Build production bundles:

    npm run build
    

Running an Attached (Piped) Run

Windows-safe unbuffered pattern:

cmd /c "python -u demo_monitor_trigger.py --duration-sec 180 --step-sec 0.5 --scenario mixed 2>&1 | python -u .\labguard_tail.py monitor-trigger-demo"

SMTP Email Setup

In Model Settings -> Email Alerts (SMTP) configure:

  • SMTP host/port/user/password
  • from name/email
  • recipient email
  • enable notifications

Then use Send Test Email to verify delivery.

Project Structure (high level)

  • electron/ - Electron main/preload and IPC handlers
  • src/ - React renderer UI, state, hooks, components
  • labguard_sdk.py - Python action registration bridge
  • labguard_tail.py - Python pipe attach bridge
  • demo_*.py - Demo/test workloads

Notes

  • This is an alpha/demo-oriented app with rapid iteration.
  • For production deployment, move secrets (SMTP/API keys) to secure environment configuration and rotate any exposed credentials.

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Lab Guard

Built With

Share this project:

Updates