Snitch

Inspiration

We were fascinated by the possibility of deploying large models on edge devices using the MemryX acceleration. So we decided to try and port a fine-tuned object detection model to MemryX format to enchance safety on construction sites. We wanted to see if we can get high objectivity data on construction site personal equipment usage, while using little resources. Once we have the data for certain period of time we can analyse what equipment is missing and understand if any actual laws were broken using ai agents, who act as a relay and an expert in laws.

What it does

Our current prototype uses a fine tuned object detection model based on the YOLO11 architecture to detect classes in the data streamed by a camera. We then apply postprocessing to draw boxes around identified classes and log when classes that correspond to safety equipment missing (class 'NO-hardhat" for example). During the runtime we accumulate reports and when the camera stream session ends these reports are sent to our agents via HTTP. These agents then Identify if any laws are being broken and send a warning to the manager/other stakeholder to review footage and check for potential safety code violations.

How we built it

For fine tuned object detection model we both found one availible for download and trained our own using pytorch and ultralytics (creators of the YOLO architecture) package. To run the detection on a camera stream we leverage the cv2 and ultralytics python modules. For report generation we use a state machine approach to bundle similar states of the system together to avoid excessive logging.

Challenges we ran into

While trying to accelerate the fine-tuned YOLO11 model using their Neural Compiler tool we ran into incompatibility issues. In particular the compiler did not support operator MatMul the way it was implemented in the YOLO11 model architechture. To solve this challenge we tried replacing the operator using onnx-graphsurgeon as well as look for other, older pretrained object detection models. Unfortunately we didn't manage to find or train or surgically create a model that would satisfy the compilers requirements and so were forced by the time constraints to abandon the idea of using MemryX acceleration.

Accomplishments that we're proud of

While we were searching for a model we decided to also finetune our own using YOLO11-nano (due to the lack of processing resources) and Construction Site Safety Image Dataset from Roboflow. The resulting model showed a lack of objectivity when applied to a video stream and we somehow messed up the labels of classes in training. But we achieved around 70% accuracy on static images.

What we learned

We learned a lot about object detection models and their implementation, having even explored their graph representations to locate problematic MatMul nodes. We also gained experience with agentic Ai approach to solving tasks. But most of all we learned that the technical task we took on was too much for the limited time frame.

What's next for Construction safety equipment monitoring

Ideally we would fine-tune a model that is both compatible with the MemryX Neural Compiler and has enough weights to provide better objectness and accuracy scores. Once this is done our approach can be deployed to edge devices, which is arguably the most important part of the project.