💡 Inspiration
Video streams are everywhere. IP cameras have democratized 24×7 monitoring, but action still depends on a human staring at a screen.
We wanted a system that doesn’t just detects, but acts.
Gemini 3’s spatial video reasoning—and its ability to preserve thought state over long-running sessions made that ambition realistic.
Gemini's technological breakthrough inspired us to build this OpenClaw for video streams.
What It Does
Percepta lets you attach AI agent workflows to any live video stream using plain language.
Example instructions:
- “Alert the staff if a customer arrives at the pharmacy counter in the mall and staff is in the backroom.”
- “If the shelf with paper stacks in the storeroom is empty, place an order for an A4 stack via shopping skill.”
- “Detect a child’s presence inside a heavy-equipment zone and call the
Child-in-Dangerweb-hook URL to trigger the IoT alarm.”
How it works
- Paste a live stream URL (RTSP)
- Describe the event in natural language
- Attach one or more actions
- AI agent skills
- Webhooks for external systems
- Telegram alerts via GemwatcherBot
When the event occurs, Percepta executes the action chain automatically.
It’s built for long-running operation, retains context across stream interruptions, and is designed to be something you can always depend on.
🛠️ How We Built It
Gemini 3 (Core Engine) Spatial video understanding + long-running thought-state preservation for fault tolerance.
Backend (Hono + Node.js)
- Agent runtime provisioning and orchestration
- Video processing and intelligent chunking
- Durable, real-time event handling
- Webhooks and Telegram alert delivery
Frontend: Vite + React + React Router + TailwindCSS
A clean, professional UI that makes complex automation feel simple.
Challenges We Ran Into
- Keeping event definitions expressive and natural without forcing users to write YAML
- Maintaining continuity when streams drop or reconnect
- Making multi-step action chains both powerful and reliable
- Balancing low latency with smart video chunking to manage costs
🏆 Accomplishments We’re Proud Of
- A true “describe → act” workflow for live video
- Long-running reliability with preserved internal state
- A flexible action-chain system that goes far beyond alerts
- A polished UX that turns complex automation into a natural experience
What We Learned
- The real value is in actions, not notifications
- Spatial reasoning fundamentally changes what’s possible with video
- Fault tolerance is non-negotiable for live streams
- Good UX is the difference between a demo and a product
What’s Next for Percepta
- Skill marketplace and reusable action templates
- Live preview and workflow simulation
- Team controls, audit trails, and compliance features
- Edge deployment for ultra-low-latency environments
Built With
- agent-skills
- antigravity
- gemini-3
- gemini-cli
- hono
- react
- react-router
- tailwind-css
- typescript
- vite
Log in or sign up for Devpost to join the conversation.