We’ve identified an important issue where in situations that have potential gun violence or armed gunmen like mass shootings in schools, first responders aren’t notified quickly enough and are slowed down by a lack of information on the exact location of the threat. For example, where a gunman is located somewhere in a large building or campus. To address this problem, we propose a fast, accurate, and inexpensive automatic gun detection system using computer vision that is capable of being hooked into existing CCTV security systems. This way, when the system detects a gun, security personnel can be quickly notified to verify the potential threat.
What it does
Currently, we can achieve gun-detection at around 82 FPS on a standard NVIDIA 2080TI which means that a single consumer level GPU is currently capable of processing 8 simultaneous video streams at around 10 FPS which is perfectly adequate for detection purposes. We provide audio and visual warnings for any security personnel who are monitoring the feeds and we also provide a large degree of customizability on how detections are recorded, how video feeds and detections are displayed, and where the detections are streamed to or shown. We also packaged the product into a Docker image that can be easily deployed to systems with support for IP camera sources.
How we built it
Using YOLOv4, we started by training with around 3000 handgun images annotated by a team at the University of Granada. From here, we realized that the model detected far too many false positives, so we used a bootstrapping technique to add over 5000 hard negative images. Essentially, we would run our model on large and diverse datasets of non-gun images and add to our training dataset the images that our model predicted as containing guns with high confidence. Next, we added in around 2000 synthetic images from Edgecase.ai to further improve our model. These images were made up of 3D models of people holding guns in a wide variety of environments and lighting conditions. However, this led to an issue of reality gap where the model is trained on images that don’t exactly reflect reality. To mitigate this, we pretrained on the synthetic data first and selected the weights with the highest performance before continuing to train on purely real data. We then annotated and trained on additional real-world images from Google Open Images and random video clips online that contained people holding guns. Through the process of around 30 training/testing rounds with different data combinations, hyperparameters like network resolution and batch size, as well as different types of data augmentations like mosaic and cutout, acceptable real world performance was achieved with a mAP of 98.53% on a validation dataset of 299 gun images and 935 hard negative images.
Challenges we ran into
In terms of ongoing challenges, our model still struggles with guns that are farther away. While this can be mitigated by increasing the resolution at which the feeds are processed. We found that this wasn’t worth the performance trade off as it would make more sense to just add more cameras for greater coverage as cameras are relatively cheap. The small increase in accuracy would simply not be worth the cost in processing speed. While we succeeded in reducing false positives a great deal and a mAP of 98.53% may sound impressive on paper, in the real world, since guns are an infrequent occurrence, the percentage of false positives to true positives will remain quite high. To mitigate this, we proposed the red-alert system. In our testing, we found that in most cases, false positives only occur for one frame or so. Therefore, by only generating a red alert when there are multiple positive detections in a short period of time, we can significantly reduce the percentage of false positives. Additionally, there are some issues on privacy where questions of who can access these security streams and associated data need to be addressed. Finally, there’s an issue where a system like this may cause security guards to be more lax much like people sleeping in a self-driving Tesla on a highway. We mitigate this somewhat with our auditory alert feature.
Accomplishments that we're proud of
We succeeded in achieving our goal of creating a performant real-time gun detection system that can help lower the response time to potential gun violence and affordably increase the security of soft targets. In our research, we did find a few other companies trying to solve this problem as well. However, we found that these companies charge a great deal of money for these kinds of systems which is self-defeating as oftentimes the people who need more security the most are the ones who can’t pay the expensive prices. Therefore, by creating an open source solution, we can hopefully bring affordable security to a much larger audience.
What's next for OpenGD
- A GUI to replace the configuration file for better usability
- A companion app or system to notify a security company or police when a gun is detected.
- Multi-GPU support for running this software on servers with more than one GPU is also a desired feature.
- Support for other firearms like assault rifles which simply requires an addition of more training images as our dataset was primarily handguns,
- TKDNN support which is a kind of neural network optimizer compatible with our system that can allow for 2 times the processing speed.
- Explore secondary processing methods such as detecting arms/hands and doing a more targeted detection pass on the surrounding area