💡 Inspiration
We live in a world covered in cameras, yet we are less safe than ever. Security teams are overwhelmed by thousands of video feeds and sensor logs they can't possibly watch in real-time. Current systems are reactive-they only record disasters after they happen. We wanted to build something proactive.
We asked ourselves: "What if an AI could watch every camera with the reasoning capability of an expert security analyst?"
That question led to SENTINEL.AI. We were inspired by the potential of Google Gemini 3's Multimodal capabilities to fuse vision, audio, and text into a single "Risk Intelligence" stream that predicts danger before it escalates.
🛡️ What it does
SENTINEL.AI is a Multimodal Risk Intelligence Platform that turns passive surveillance into active foresight.
- Ingests Multimodal Data: It takes in live CCTV feeds (Vision), environmental sound descriptions (Audio), and sensor logs (Text).
- Fuses Context: Unlike standard object detection, it understands context. It doesn't just see "people running"; it correlates that with "screaming audio" and "smoke sensor logs" to identify a "Panic Situation."
- Predicts Risk: It outputs a real-time Risk Trajectory (Low to Critical) and forecasts what is likely to happen in the next 5-10 minutes.
- Actionable Intelligence: It generates a structured JSON report with specific recommended actions (e.g., "Trigger Fire Suppression", "Lockdown Sector 4"), ready for IoT integration.
⚙️ How we built it
We built SENTINEL.AI using a modern, scalable stack:
- AI Core: We leveraged Google Gemini 2.0 Flash (via Google AI Studio) for its incredibly fast multimodal reasoning. We used sophisticated Prompt Engineering to force the model to act as a "Tactical Safety Officer" and output strict JSON.
- Backend: We built a FastAPI (Python) server to handle data ingestion and manage the interaction with the Gemini API. This ensures separation of concerns and scalability.
- Frontend: We used Streamlit to create a high-performance "Command Center" dashboard. We implemented a custom Glassmorphism UI using raw CSS injection to give it a futuristic, mission-critical aesthetic.
- Hardware Simulation: We built a "Simulation Mode" that injects complex scenarios (like riots or factory failures) to demonstrate the AI's reasoning capabilities without needing a physical disaster.
🧩 Challenges we ran into
- Multimodal Synchronization: aligning visual data with audio context was tricky. We had to design a data structure that packaged these distinct signals into a single, cohesive prompt for Gemini.
- Hallucination Control: Early versions of the model would sometimes invent safety threats. We solved this by implementing a Risk Engine that "clamps" the model's confidence scores and enforces a rigid schema, ensuring that only high-confidence threats trigger alerts.
- Latency: Real-time safety needs speed. We optimized our API calls and used Gemini 2.0 Flash to get response times down to a level acceptable for live monitoring.
🏆 Accomplishments that we're proud of
- The "Context" Breakthrough: Watching the AI correctly identify a "Student Protest" versus a "Violent Riot" based on subtle cues in the image and description was a huge win.
- Strict JSON Output: Getting a Generative LLM to reliably control "IoT Devices" (simulated) by outputting perfect JSON every single time.
- The UI: We're incredibly proud of the "Command Center" aesthetic. It looks and feels like a production-grade security tool, not just a hackathon prototype.
🧠 What we learned
- Multimodality is King: Text-only or Vision-only models are insufficient for safety. The real power comes from the intersection of sight, sound, and data.
- Prompt Engineering is Coding: Crafting the system instructions was just as complex as writing the Python code. We learned how to "program" the model's behavior using natural language constraints.
- Safety First: Building safety tools requires an ethical responsibility. We implemented safety guardrails to ensure the AI doesn't profile individuals but focuses on behavioral risk.
🚀 What's next for SENTINEL.AI
- Edge Deployment: Running a distilled version of the model directly on security cameras (Edge AI).
- Real IoT Integration: Connecting the JSON output to actual smart locks and alarm systems.
- Fine-Tuning: Training a custom Gemini adapter on specific security datasets (e.g., industrial safety compliance for factories).
- autonomous Drones: Integrating drone video feeds for search and review missions.
Log in or sign up for Devpost to join the conversation.