Inspiration
While modern AI models are excellent at describing what they see (OCR/Captioning), they often fail to understand the logic or risks behind a scene. i am inspired by the gap between "seeing" and "reasoning." , wanted to build a "Cognitive Safety Net"—a tool that doesn't just list objects, but audits the logical flow of human actions, administrative documents, and physical environments to prevent errors before they happen.
What it does
Omni-Audit is a next-generation multimodal logic engine. It takes visual inputs (images or screenshots) and performs a deep-chain reasoning audit. Administrative Logic: It detects temporal anachronisms and batch errors in official documents (e.g., catching a 2026 date in a current-year workflow). Physical Safety: It identifies dangerous configurations, such as "daisy-chaining" electrical outlets, by linking visual setups to physical consequences (fire hazards). UX/Design Audit: It identifies violations of human mental models in design, such as illogical elevator button layouts or accessibility barriers. our reasoning engine predicts structural failure before it happens. Real-time multimodal logic designed to save lives and hardware.
How i built it
The core of Omni-Audit is built on Python and Streamlit for a low-latency web interface. i integrated the Gemini 3 Flash API (gemini-3-flash-preview) using the Google Generative AI SDK. By utilizing the v1beta features, i am able to tap into Gemini 3's advanced reasoning kernels. i designed a modular prompt architecture that forces the model to perform "Chain-of-Thought" reasoning before providing a risk assessment, ensuring the highest level of logical accuracy.
Challenges i ran into
The primary challenge was managing the Gemini 3 API rate limits during high-traffic testing. i solved this by optimizing our multimodal payloads and implementing a "Reasoning-First" system instruction set that kept the model's output concise yet logically dense. i have also worked through the spatial reasoning challenges of getting the AI to recognize "out-of-order" numerical sequences in physical objects like elevator panels.
Accomplishments that i am proud of
i am incredibly proud that Omni-Audit successfully identified a "Temporal Anachronism" in a university circular—a task that requires understanding batch numbers, graduation years, and current dates simultaneously. i also achieved Cross-Image Synthesis, where the model successfully linked a dangerous electrical setup to its eventual physical consequence (melting/fire) without being explicitly told they were related.
What i learned
Building with Gemini 3 taught us that the future of AI is "System 2" thinking. i learned how to leverage low-latency models to perform high-stakes reasoning. i discovered that the multimodal context window in Gemini 3 is uniquely capable of understanding human "mental models" and spatial logic in a way that previous generations of AI simply could not.
What's next for Omni-Audit: Next-Gen Multimodal Logic Engine
My next step is moving from static images to Real-Time Video Auditing. We envision Omni-Audit as a wearable "Assistant Eye" for industrial technicians and medical professionals, providing a live reasoning overlay that flags procedural errors in real-time. also i have a plan to integrate Enterprise Safety Standards (OSHA/WCAG) directly into the reasoning kernel to provide certified safety audits automatically.


Log in or sign up for Devpost to join the conversation.