Inspiration

Retail security hasn’t evolved with AI. Cameras still act as passive recorders instead of active systems that understand what’s happening. We wanted to build something that doesn’t just capture theft, it detects it, responds in real time, and makes the data actually useful.

What it does

SPOTTER turns any camera into an intelligent security system. It detects suspicious behavior in real time, confirms theft using a vision-language model, and triggers a live voice deterrent. Every incident is logged and can be queried through a natural-language chatbot to answer questions about store activity.

How we built it

We built a real-time detection pipeline using YOLOv8 for initial object tracking and behavior scoring. When a threshold is reached, a locally running Gemma model verifies the event and generates a contextual response. ElevenLabs converts that into a live voice announcement. Events are stored in MongoDB and synced to Snowflake, where Cortex Search powers a retrieval-based chatbot for analytics. The frontend is a lightweight multi-page interface showing live feeds, alerts, and insights.

Challenges we ran into

The biggest challenge was balancing speed and accuracy in real time. Running vision models fast enough for live deterrence while avoiding false positives required tuning thresholds and structuring a two-stage detection system. Another challenge was integrating multiple systems (local AI, cloud databases, streaming audio) into a seamless pipeline with low latency.

Accomplishments that we're proud of

We built a full end-to-end system that goes from detection to action in seconds. The ability to trigger a real-time voice response based on live camera input was a major milestone. We’re also proud of the RAG-based analytics layer, which turns raw events into something users can actually query and understand.

What we learned

We learned how to design around latency constraints in AI systems and how important system architecture is compared to individual models. We also gained experience integrating local and cloud AI workflows, and saw how much impact good data structuring has on downstream analytics.

What's next for Spotted

Next steps include improving detection accuracy with better behavioral modeling, adding multi-camera tracking, and refining the 3D store visualization. We also plan to support multi-turn conversational analytics and expand integrations for real-world deployment.

Built With

Share this project:

Updates