💡 Inspiration

Video streams are everywhere. IP cameras have democratized 24×7 monitoring, but action still depends on a human staring at a screen.

We wanted a system that doesn’t just detects, but acts.

Gemini 3’s spatial video reasoning—and its ability to preserve thought state over long-running sessions made that ambition realistic.

Gemini's technological breakthrough inspired us to build this OpenClaw for video streams.


What It Does

Percepta lets you attach AI agent workflows to any live video stream using plain language.

Example instructions:

  • “Alert the staff if a customer arrives at the pharmacy counter in the mall and staff is in the backroom.”
  • “If the shelf with paper stacks in the storeroom is empty, place an order for an A4 stack via shopping skill.”
  • “Detect a child’s presence inside a heavy-equipment zone and call the Child-in-Danger web-hook URL to trigger the IoT alarm.”

How it works

  1. Paste a live stream URL (RTSP)
  2. Describe the event in natural language
  3. Attach one or more actions
  • AI agent skills
  • Webhooks for external systems
  • Telegram alerts via GemwatcherBot

When the event occurs, Percepta executes the action chain automatically.

It’s built for long-running operation, retains context across stream interruptions, and is designed to be something you can always depend on.


🛠️ How We Built It

  • Gemini 3 (Core Engine) Spatial video understanding + long-running thought-state preservation for fault tolerance.

  • Backend (Hono + Node.js)

    • Agent runtime provisioning and orchestration
    • Video processing and intelligent chunking
    • Durable, real-time event handling
    • Webhooks and Telegram alert delivery

  • Frontend: Vite + React + React Router + TailwindCSS

A clean, professional UI that makes complex automation feel simple.


Challenges We Ran Into

  • Keeping event definitions expressive and natural without forcing users to write YAML
  • Maintaining continuity when streams drop or reconnect
  • Making multi-step action chains both powerful and reliable
  • Balancing low latency with smart video chunking to manage costs

🏆 Accomplishments We’re Proud Of

  • A true “describe → act” workflow for live video
  • Long-running reliability with preserved internal state
  • A flexible action-chain system that goes far beyond alerts
  • A polished UX that turns complex automation into a natural experience

What We Learned

  • The real value is in actions, not notifications
  • Spatial reasoning fundamentally changes what’s possible with video
  • Fault tolerance is non-negotiable for live streams
  • Good UX is the difference between a demo and a product

What’s Next for Percepta

  • Skill marketplace and reusable action templates
  • Live preview and workflow simulation
  • Team controls, audit trails, and compliance features
  • Edge deployment for ultra-low-latency environments

Built With

  • agent-skills
  • antigravity
  • gemini-3
  • gemini-cli
  • hono
  • react
  • react-router
  • tailwind-css
  • typescript
  • vite
Share this project:

Updates