Percepta

Video streams are everywhere. IP cameras have democratized 24×7 monitoring, but action still depends on a human staring at a screen.

We wanted a system that doesn’t just detects, but acts.

Gemini 3’s spatial video reasoning—and its ability to preserve thought state over long-running sessions made that ambition realistic.

Gemini's technological breakthrough inspired us to build this OpenClaw for video streams.

Percepta lets you attach AI agent workflows to any live video stream using plain language.

Example instructions:

“Alert the staff if a customer arrives at the pharmacy counter in the mall and staff is in the backroom.”
“If the shelf with paper stacks in the storeroom is empty, place an order for an A4 stack via shopping skill.”
“Detect a child’s presence inside a heavy-equipment zone and call the Child-in-Danger web-hook URL to trigger the IoT alarm.”

When the event occurs, Percepta executes the action chain automatically.

It’s built for long-running operation, retains context across stream interruptions, and is designed to be something you can always depend on.

Gemini 3 (Core Engine) Spatial video understanding + long-running thought-state preservation for fault tolerance.
Backend (Hono + Node.js)
- Agent runtime provisioning and orchestration
- Video processing and intelligent chunking
- Durable, real-time event handling
- Webhooks and Telegram alert delivery
Frontend: Vite + React + React Router + TailwindCSS

A clean, professional UI that makes complex automation feel simple.

Keeping event definitions expressive and natural without forcing users to write YAML
Maintaining continuity when streams drop or reconnect
Making multi-step action chains both powerful and reliable
Balancing low latency with smart video chunking to manage costs

Built With

Private user started this project — Feb 09, 2026 04:49 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.