Our Inspiration

In 2023, more than 2.6 million workplace accidents were reported in the United States, with 5,200 resulting in fatalities. Many of these incidents were preventable, caused by overlooked hazards such as exposed wires, unsafe electrical equipment, or sharp objects on the floor.

On busy construction sites, warehouses, and factory floors, workers are hyper-focused on finishing tasks and getting home safely. In that rush, hazards can be missed — even when they are in plain sight. Too often, these conditions represent clear violations of OSHA standards. Yet incidents frequently go unreported, leaving workers to manage costs themselves. Compounding the issue, 33% of U.S. workers have never received proper safety training, leaving them without the awareness to recognize imminent dangers.

To address this gap, we developed AnomalAI: a real-time risk detection and notification system. AnomalAI lives inside metaglasses or any device that captures a video feed, continuously scanning the environment for safety violations, alerting workers immediately, and notifying supervisors. It also produces clear reports on workers’ rights and safety requirements, helping organizations stay aligned with OSHA standards and strengthening their overall safety culture.

With AnomalAI, auditors and managers can ensure compliance, reduce incident rates, and foster safer, more productive workplaces — without relying solely on workers to spot every risk in the middle of a busy shift.


What does AnomalAI do?

AnomalAI is a computer vision–powered safety intelligence system designed for high-risk environments, such as construction sites, warehouses, and factories. Workers wear metaglasses or use any device with a video feed while carrying out their daily tasks. In the background, AnomalAI uses advanced zero-shot modeling with Segment Anything Model (SAM) and Detect Anything Model (DAM) algorithms to map out each worker’s environment and perform frame-by-frame analysis, identifying hazards and safety violations in real time without needing extensive pre-training on every object.

An agentic AI operates continuously in the background, applying natural language processing to automatically mark, label, and describe unsafe objects or conditions. Hazards are surfaced directly to workers through live notifications, ensuring that risks can be corrected before they escalate. At the same time, AnomalAI employs a retrieval-augmented generation (RAG) model to tag each detected safety violation with the exact OSHA policy it violates, bridging the gap between real-world risks and regulatory compliance.

The system not only provides workers with immediate, on-the-job awareness, but also generates industry-standard safety reports for managers and auditors. These reports include actionable insights and compliance mapping, giving organizations a clear, data-driven record of violations and recommendations. In this way, AnomalAI empowers workers to stay safe in the moment while equipping managers and auditors with the tools they need to reduce incident rates, ensure compliance, and strengthen overall workplace safety culture.


How was AnomalAI built?

AnomalAI has five components encompassing the application: computer vision and machine learning models, retrieval-augmented generation, data management, backend, and user experience (frontend). Generally, we used Flask to maintain our backend logic and React to build our frontend UI.

The pipeline comes as follows: [1] the user records a video, [2] the CV and ML model takes 1 out of 30 frames and recognizes objects that are deemed potentially dangerous, [3] the potentially dangerous objects are fed into Gemini, which then gives a deeper description about the severity level of the object and its implications, [4] the description is then fed into an RAG which builds an ultra-specific report relating to safety conditions and OSHA standards, and [5] is displayed as a video recognizing the dangers in near real-time and also to a different front-end in which the user can see a report of the safety violations in that workplace environment.

Computer Vision and Machine Learning

  • OpenCV
  • OpenCV parses, analyzes, and processes our input in the form of a video. We use OpenCV to optimize which frames we capture and how often we would like to capture them.
  • Segment Anything Model
  • The Segment Anything Model (SAM) separates all components of a given frame for a video, generating masks overlaid over the video in real time. Moreover, the SAM returns the x and y-coordinates for each item detected in a frame
  • Depth Anything Model
  • The Depth Anything Model (DAM) gets the distance from the viewer to the item in the field of view. In other words, the DAM returns the z-coordinate of each item.
  • CLIP
  • CLIP is a model that extracts labels from a fine-tuned vocabulary built to optimize performance across different environments. CLIP recognizes an object given an image, which is then fed into a Zero Shot Classifier.
  • Zero Shot Classifier
  • A custom probabilistic Zero Shot Classifier determines whether each item in a given frame for a video is safe or not, marking each frame that contains potentially unsafe objects.

Retrieval-Augmented Generation

  • LangChain
  • LangChain bridges the logic between the vector embeddings store, the top-k contexts, and the AI model through its platform and support.
  • Ollama Search
  • Ollama Search embeddings were utilized to provide a lightweight embedding model used to embed 20+ OSHA documents relating to workplace safety standards and guidelines. Moreover, similarity search using Ollama embeddings was pretty quick with their smaller models.
  • Supabase
  • The Supabase vector store stores the Ollama embeddings in the main database. Because we maintained a closed feedback loop of storing new reports as embeddings and then fetching them on every new request, we decided to use Supabase to avoid logistical errors.
  • Google Gemini
  • After the top-k entries are fetched for a string query as a context, Google Gemini builds an output using that context to provide ultra-specific reports on potential OSHA violations for the workplace situation described in the query.

Database Management

  • Supabase
  • Supabase provides a lightweight interface and endpoint for different parts of our code to interact with. We used four different tables to store different data, ranging from formal reports to raw vector embeddings generated by Ollama. Moreover, Supabase writes when new information is discovered (usually in the RAG or in other intermediary steps) and reads when the user requests to see a report.

Backend & Frontend

  • Flask
  • As most of our logic for our ML and CV models was written in Python, we decided to use the Flask framework to provide an easy way to communicate with our React frontend.
  • React.js
  • React.js provides an easy intuition for designing the user interface and communicating with the outputs provided by the backend.


Our Challenges

One of the primary challenges we encountered was integrating the diverse components of the system — computer vision, machine learning, retrieval-augmented generation, user interface and experience, and database management. Each of these elements was initially developed in a disjointed manner, which made it difficult to align them into a cohesive application. As a result, we had to re-architect significant portions of the system to ensure that the different modules worked seamlessly together. While this restructuring process was time-intensive, it ultimately produced a far more stable, scalable, and unified platform.

Other challenges included trying to stay awake for 36 hours straight on energy drinks or trying to avoid doing the daily Wordle, but hey, we are all human.


Our Accomplishments

Our biggest accomplishment was being able to program and collaborate for 36 hours straight. We faced many logistical challenges on this journey — including scaling and networking issues — and we often felt tired and willing to quit. Nonetheless, we pushed through, often avoiding sleep to finish critical structural issues and help each other out. In the end, it was all worth it as we built an application that provided an impact to a huge real-world problem, all while being exposed to new technologies and learning to become better collaborators.


Our Takeaways

Besides learning how to integrate a variety of technologies (such as a direct CV pipeline to identify and score the dangers of objects in real time) to accomplish a goal, we learned how to become better communicators and thinkers with each other. At the start of the hackathon, we all worked on different parts of the application separately; however, after going through an integration hell, we started collaborating closely to bring the final product forward.


What's next for AnomalAI?

The Meta AI Glasses 2 recently came out, in which the Meta team added projection capabilities of external programs on the glasses themselves. We will integrate our technology with these glasses to provide real-time risk detection that workers can see right from their eyes, without needing their phones or cameras to record their environment.


Built With

Share this project:

Updates