VisionOps: AI Safety Monitor

code showing gemini api call
generated pdf output of the project's feature
the dashboard detecting a hazard (red alert).
the login scrren

Inspiration

What it does

💡 Inspiration

Industrial accidents and manufacturing errors cost the global economy billions of dollars every year. In high-stakes environments like pharmaceutical manufacturing, a single broken capsule or a worker missing a glove can ruin an entire batch or, worse, endanger patient safety.

Currently, factories rely on standard CCTV cameras, but these are passive systems. They require a human guard to stare at monitors 24/7. This approach is expensive and fundamentally flawed: humans cannot maintain 100% concentration for an entire shift. Fatigue leads to missed errors, and missed errors lead to accidents.

We wanted to solve this by building a system that doesn't just record video, but actually understands it.

🤖 What it does

VisionOps is an intelligent safety sentinel that turns standard security cameras into autonomous guardians. Powered by Google's Gemini 3 Pro, it analyzes live video feeds in real-time to detect hazards, safety violations, and quality control issues.

Key capabilities include:

Real-Time Hazard Detection: Instantly identifies workers without PPE (helmets, masks), chemical spills, fires, or machinery malfunctions.
Instant Multi-Channel Alerts: When a hazard is detected, the dashboard flashes a "Red Alert" and immediately sends an SMS via Twilio to the safety manager.
Automated Compliance: The system logs every incident and generates downloadable PDF Incident Reports for legal and safety audits.
Interactive 3D Dashboard: A futuristic, responsive command center built with Streamlit and Vanta.js for real-time monitoring.

⚙️ How we built it

The project is built on a robust Python backend with a modern Streamlit frontend.

The Brain (Gemini 3 Pro): We stream video frames from the camera directly to the Gemini 3 Multimodal API. We engineered a specialized system prompt that forces the model to act as a "Factory Safety Officer" and output strict JSON data containing the safety status, confidence score, and specific violation details.
The Eyes (OpenCV): We use OpenCV to capture and preprocess high-fidelity video feeds before sending them for inference.
The Voice (Twilio): We integrated the Twilio API to bridge the digital and physical worlds. The moment Gemini detects a high-confidence threat, a server-side trigger dispatches an SMS alert.
The UI (Streamlit + Vanta.js): We pushed the limits of Streamlit by injecting custom Javascript to create a reactive "Neural Network" 3D background that visually represents the AI thinking in real-time.

🧠 Challenges we ran into

Prompt Engineering for Consistency: Getting a Large Language Model to output structured JSON 100% of the time was difficult. We spent hours refining the system prompt to ensure it wouldn't hallucinate and would only trigger on genuine hazards.
Real-Time Latency: Sending video frames to the cloud takes time. We optimized our frame-skipping logic and image compression to find the perfect balance between detection speed and API bandwidth.
Streamlit Customization: Streamlit is great for data apps but limited for custom UI. We had to learn how to inject raw HTML and CSS to get the "Cyberpunk" aesthetic and the 3D background working correctly.

🏆 Accomplishments that we're proud of

End-to-End Automation: We successfully built a pipeline that goes from Visual Input → AI Understanding → Physical Alert (SMS) → Digital Record (PDF) without any human intervention.
The "Wow" Factor: We are particularly proud of the UI. It looks and feels like a professional enterprise software product.
Practical Utility: This isn't just a toy; it solves a real, expensive problem that exists in thousands of factories today.

🚀 What's next for VisionOps

While we optimized VisionOps for industrial safety, the underlying architecture is universal.

Scalability: By simply updating the system prompt, this same code can be adapted to detect shoplifters in retail stores, monitor playground safety in schools, or secure bank vaults.
Edge Deployment: We plan to explore distilling the model to run on edge devices (like Raspberry Pi) for environments with poor internet connectivity.

VisionOps isn't just a project; it's the future of autonomous industrial safety.

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for VisionOps: AI Safety Monitor

Built With

css
fpdf
google-gemini
google-gemini-api
html
opencv
plotly
python
streamlit
twilio
vanta.js

Submitted to

Gemini 3 Hackathon

Created by

I designed and built the entire application from scratch as a solo developer.

My contributions included:
- Backend Engineering: Building the Python logic to process video frames and integrate the Twilio API for SMS alerts.
- AI Integration: Configuring the Google Gemini 3 Pro model with specialized system prompts for industrial safety detection.
- Frontend Development: Creating the interactive Streamlit dashboard and integrating the custom Vanta.js 3D background.
- Data Visualization: Implementing the real-time confidence graphing and PDF report generation.

ARSH SHARMA

Updates

ARSH SHARMA started this project — Feb 09, 2026 11:38 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.