Inspiration The inspiration for VisionGuard came from real-world experience working as a security guard. Monitoring CCTV for long shifts—especially at night—was monotonous, mentally draining, and easy to lose focus on. Traditional CCTV systems simply record footage or detect basic motion; they have no understanding of behaviour, which is where real incidents actually occur. Behaviours like someone approaching a door, leaving a package, or two people entering late at night can be easily missed when humans get tired or overloaded. We realised CCTV doesn’t need more cameras—it needs intelligence. VisionGuard was created to solve that problem by introducing a teachable AI agent that understands behaviours, not just motion.
What It Does VisionGuard transforms passive CCTV into an intelligent, behaviour-aware agent. Users can describe behaviours in plain language, such as: “Alert me if someone approaches the door with something in their hand.” The agent watches the live feed in real time, understands movement, direction, proximity, and context, and triggers an alert the moment the behaviour happens. Every alert includes a transparent Glass-Box reasoning timeline that shows exactly what the agent saw and how it interpreted that behaviour. VisionGuard works on live YouTube CCTV streams with under one second of latency, providing real-time intelligent monitoring.
How We Built It We developed an end-to-end behaviour detection pipeline consisting of live video ingestion, frame extraction, and a vision model capable of detecting objects and movement direction. A natural-language behaviour engine converts user instructions into formal rules, while an event detector checks if live observations match those rules. When a behaviour is detected, a Glass-Box explanation system produces a step-by-step reasoning timeline. The frontend displays the live stream, alerts, and explanations in real time. The entire system forms a complete agentic loop: observe → interpret → decide → explain.
Challenges We Ran Into Behaviour detection is far more complex than object detection; we had to reason about direction, proximity, and context. Converting natural-language behaviour descriptions into consistent logic was difficult due to differences in phrasing. Public CCTV feeds were low-resolution, noisy, and unstable, which challenged detection accuracy. We also had to maintain sub-second latency to make the system feel truly real-time. Developing a readable, truthful Glass-Box explanation layer required careful design. Coordinating all components—vision, agent logic, behaviour engine, streaming, and UI—within hackathon constraints was another major challenge.
Accomplishments That We're Proud Of We built a fully teachable natural-language behaviour engine that performs real-time detection on live YouTube CCTV streams. We created a transparent Glass-Box reasoning timeline for every alert, transforming CCTV from passive recording into active behavioural intelligence. We delivered a stable end-to-end agentic system in just two days and built a tool that directly solves the problems experienced during real security work. VisionGuard demonstrates how AI can finally make CCTV systems intelligent, adaptive, and behaviour-aware.
What We Learned We learned that behaviour understanding requires multi-step reasoning—not just object detection—and that transparency is essential for trust. Natural-language interfaces drastically enhance usability compared to traditional rule-based configurations. Real-world CCTV conditions require robust, fault-tolerant detection logic. We also learned the importance of strict team coordination when building interconnected agent systems, and gained a deeper understanding of the challenges and power of real-time AI on top of live video streams.
What’s Next for VisionGuard Our roadmap includes teach-by-example (allowing users to show a behaviour once and have the agent learn it), multi-camera coordination, and anomaly detection for unusual behaviour. We plan to introduce mobile push alerts with embedded reasoning snapshots, build a full security dashboard for multi-feed monitoring, and integrate with enterprise access control and building management systems. VisionGuard aims to scale into a fully deployable intelligent security platform for businesses, offices, and smart buildings.
Built With
- api
- behaviour
- code
- cursor
- engine
- ffmpeg
- github
- glass-box
- html/css
- javascript
- language
- live
- models
- natural
- pip
- processing
- real-time
- reasoning
- videocapture
- vision
- vs
- youtube


Log in or sign up for Devpost to join the conversation.