Inspiration
Schools already have cameras -- but not eyes on every feed. ~93% of U.S. public schools use security cameras, yet monitoring is rarely real-time. Meanwhile, 2024 saw 39 K–12 school shootings with injuries or deaths -- a sobering reminder that even seconds matter when escalating to staff and first responders. Vigilant exists to turn raw camera footage into verified, actionable alerts within moments.
What it does
Vigilant watches live campus cameras, detects visible weapons, and computes risk factors (computer vision). A human-in-the-loop tap confirms or dismisses. On confirmation (or high-confidence policy), Vigilant pushes targeted lockdown messages, updates a live security dashboard, and immediately contacts first responders and school administrators.
How we built it
- Frontend: Vite + React + Tailwind for a fast, responsive UI. The dashboard renders four–six camera tiles, an event rail (Live / Past), and a high-severity modal. Canvas overlays draw red bounding boxes using normalized coordinates streamed from the backend.
- Backend: Python with OpenCV for frame ingest and sampling (configurable FPS to balance latency and API usage). Each frame is JPEG-encoded and sent to a Roboflow Hosted Inference model; detections are mapped back to pixel space and normalized for the UI.
Challenges we ran into
- WebSockets at scale (dev mode): Vite proxy + FastAPI handshake/CORS quirks; fixed with a stable /ws path, heartbeat pings, and reconnect logic.
- Frame rate vs. cost/latency: Tuning OpenCV sampling (e.g., 3–5 FPS) so Roboflow calls stay within free-tier credits while keeping detections snappy.
- Timestamp alignment: Ensuring event cards reflect the exact frame time across multiple streams and the screenshot shown in the high-severity modal.
- API wiring & errors: Handling transient 429/5xx responses, exponential backoff, and idempotent retries to avoid duplicate events.
- Dependency wrangling: OpenCV + FFmpeg on macOS, Node/React versions, and consistent .env handling across front/back.
Accomplishments that we're proud of
- Clean, production-feeling UI with real-time overlays, severity badges, and a tight approval workflow.
- Robust event schema used everywhere (video, future audio, fusion), which made the system easy to extend.
- Incident summaries ready for first responders (Gemini), delivered through Twilio in the demo environment.
What we learned
- How to stream detections reliably: backpressure, throttling, and keeping WebSocket updates atomic.
- Practical API orchestration across multiple vendors (Roboflow, Twilio, Supabase, LLMs) with clear fallbacks.
- Designing a unified event model saves time—front-end rendering, moderation, and escalation all consume the same payload.
- Prompting LLMs for structured outputs (JSON) is critical when composing downstream alerts.
What's next for Vigilant
- Audio detection: plug in a lightweight gunshot classifier (e.g., YAMNet) and fuse with vision within a 3–5s window to raise Critical automatically.
- Direct camera integration: RTSP/ONVIF ingest for real school cameras; per-site configuration and health checks.
- Model tuning: collect venue-specific negatives (doors slamming, lockers) and fine-tune to reduce false positives; add active-learning from operator feedback.
- Privacy & safety: optional face blurring in recordings, role-based access, and clearer audit logs for every action.
- Multimodal guides: contextual playbooks (LLM) that adapt steps based on camera location, time of day, and who confirmed the alert.



Log in or sign up for Devpost to join the conversation.