OmniSense AI

The Genesis of OmniSight AI: Bringing Reasoning to the Visual World

The Inspiration

The inspiration for OmniSight AI came from a simple observation of modern industrial environments: we have more cameras than ever, yet safety monitoring remains a reactive, human-dependent task. Traditional computer vision can tell you that an object exists, but it cannot explain why that object poses a risk in a specific context. When Google announced the Gemini 3 API with its advanced "Thinking Mode" and multimodal reasoning, I saw an opportunity to move from simple detection to true spatial intelligence.

How I Built It

OmniSight AI was built using a "Vibe Coding" approach, starting with rapid prototyping in Google AI Studio. The architecture is designed to leverage the modularity of the Gemini 3 Pro model:

The Reasoning Core: I utilized Gemini 3's Thinking Mode (High) to act as the primary logic layer. This allows the system to process a video frame and "deliberate" on the scene before providing an output.
Multimodal Tooling: I integrated the Code Execution tool to handle spatial mathematics. For example, if the AI detects a forklift too close to a pedestrian, it calculates the estimated distance using:

and renders a risk-level chart using Matplotlib.

Real-World Grounding: To solve the problem of hallucinated regulations, I used Google Search Grounding to pull live safety standards (like ISO or OSHA) based on the specific equipment identified in the image.

What I Learned

Building this project taught me that the future of AI isn't just about faster chat—it's about agency. I learned how to orchestrate Context Caching to keep facility blueprints in the model's "short-term memory," which reduced latency by over and made the application feel like a real-time monitor rather than a slow analysis tool.

Challenges Faced

The biggest challenge was "Visual Noise." In a busy warehouse, thousands of objects move at once. Initially, the model would trigger too many alerts. I solved this by refining the System Instructions to prioritize "high-consequence anomalies." I also had to navigate the complexity of mapping 2D image coordinates to 3D space for accurate distance calculations, which required iterative prompting and fine-tuning the Code Execution logic.

Built With

css3
fastapi
gemini3
gemini3pro
github
googleaistudio
next.js
python
tailwindcss

Updates

RUSHIT SHAH started this project — Dec 31, 2025 10:53 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.