Sentinel AI
Fusing Audio, Vision, and Maps with Gemini
The Inspiration: "Every Second Counts"
In an emergency, a 911 dispatcher is the most critical link. They are forced to listen to a high-stress audio call, manually type a text summary, and guess which of the 1,000+ city cameras to look at.
This is a slow and manual process. By the time police are dispatched, the suspect is gone, and the data is already cold.
We built Sentinel AI to fix this. We give dispatchers and first responders what they need most: instant, AI-verified situational awareness.
What It Does
Argus AI is an event-driven data pipeline that transforms a 911 audio call into a verified, queryable incident on a map—all in seconds.
Here is the 4-step process:
AI Transcription & Entity Extraction: We simulate a 911 call. This audio file is fed to the Gemini API, which transcribes it. A second Gemini prompt reads this transcript and extracts critical entities in a perfect JSON format: the suspect description (eg. "person in a black hoodie"), the vehicle (eg. "white Toyota Camry"), and the location (eg. "Wall Street, Manhattan").
AI Multi-Camera Vision Analysis: Instead of just guessing, Sentinel AI takes the suspect description and feeds it to Gemini 2.5 Pro. We provide it three (simulated) nearby camera feeds and ask a simple, powerful question: "Find the 'person in a black hoodie'." The model analyzes all three images in a single call and returns a JSON object identifying the
winning_camera_nameand a justification for its choice.Event-Driven Data Pipeline (S3 $\rightarrow$ Lambda $\rightarrow$ MongoDB): This is the core of our backend. The final, consolidated JSON report—containing the transcript, incident coordinates, and the public URL for the "evidence" image—is uploaded to an AWS S3 bucket.
- This S3 upload instantly triggers an AWS Lambda function.
- The Lambda function reads the incident data and inserts it directly into our MongoDB Atlas database, creating a permanent, queryable record.
The Dashboard & Chatbot (Streamlit): The user-facing app is a Streamlit dashboard with two tabs:
- Incident Map: A user can search for incidents in plain English (e.g., "show me thefts in Manhattan"). Gemini converts this query into a MongoDB filter, which runs on our database. We then use Folium to plot every matching incident on an interactive map.
- Chatbot: A conversational interface that uses the same Gemini-to-MongoDB pipeline to answer natural language questions about the historical incident data.
How We Built It
This project is a 100% serverless, event-driven architecture.
- Frontend & UI: Streamlit (using
st.tabs,st.expander, andst.chat_input). - Mapping: Folium (integrated with
streamlit-folium). - AI Model (The "Brain"):
- Gemini 2.0 Flash: Used for high-speed, low-cost tasks: Audio Transcription and Text-to-JSON (Filter Generation).
- Gemini 2.5 Pro: Used for its powerful multi-image reasoning to scan all camera feeds at once and find the suspect.
- Data Pipeline:
- AWS S3: Acts as the data "landing zone" for our JSON reports and as a public file host for the evidence images.
- AWS Lambda: The serverless "glue" that provides the S3-to-database trigger.
- MongoDB Atlas: Our high-performance, scalable database for storing and querying all structured incident data.
- Core Logic: All backend pipeline and AI orchestration were written in Python.
Log in or sign up for Devpost to join the conversation.