In the midst of the current pandemic, we felt like one of the biggest problems was the lack of seriousness in people regarding things as simple as wearing a mask and distancing themselves. While these things are minor, there is prior data that shows that both of these methods are effective in helping to flatten the curve and control the state of the outbreak better.
With this in mind, we wanted to find a way to ensure that people who are not following the rules are held accountable. However, this is very difficult to do logistically.
So, in areas or complexes where the amount of people gathering is not too large, an algorithm can be put into place to automatically keep track of the people not obeying by the rules set in place for the pandemic.
This idea set into motion our project; CoVoid.
What it does
We built a computer vision application to help solve the problem around lack of social distancing as well as lack of wearing masks. Our solution analyses a video feed (live or recorded), and uses a Machine Learning algorithm to determine if people in a frame are wearing a mask or not.
Alongside this, our algorithm also analyses the feed for humans, marking a certain region around them. This region is used to determine if the person is in close proximity to another person, and hence, if they are not socially distanced.
This information is then captured in the form of frames as well as numerical values, allowing the people responsible to ensure that the information is saved and can be used to trace back the people not following the rules.
How we built it
For the social distancing feature, Initially, we explored using Haarcascades to detect bodies in a frame. We got the algorithm to work and it was able to detect human bodies in frames and create bounding boxes around them. However, it was unable to detect a lot of people in many frames of videos, especially when they were further away from the camera. So we used the pre-trained YOLO model to effectively detect all the humans in the frame and this worked exceptionally well and was able to detect all the humans in the frame and predict if they were socially distant or not.
For detecting masks, we used a pre-trained Mobilenet SSD model which was able to detect faces extremely fast with good accuracy. For the detection of masks, we used the MobileNetV2 architecture to accurately identify if a person was wearing a mask are not. We used OpenCV to use the model with images and videos to draw bounding boxes around the faces and communicate whether the person was wearing a mask or not.
Challenges we ran into
In the social distancing feature, high camera angles work a lot better than front facing angles as it can easily differentiate the space between people. When we tried to combat this using the height of the person, it did not work since kids were always recognized as socially distant.
In the mask detection feature, it was difficult finding a varied dataset with enough images to classify whether someone was wearing a mask or not. The final dataset included a lot of common masks like the surgical mask and the N95 mask but did not include many different styles and colors of cloth masks. However, in the end, the model was relatively good at predicting cloth masks too.
Accomplishments that we're proud of
We came across a few hiccups along the way as mentioned in the previous question. Alongside this, we were constantly trying to change and improve the final idea to ensure that it was feasible in the real world alongside being accurate.
Through our work so far along with the research put in, to ensure that we maximize our time and found the best datasets, we were able to get a fairly well functioning model ready well within 24 hours and tried to keep on improving it even after.
So, the fact that within such a small amount of time we were able to get the most out of our time and ensure that its feasibility matched the level required by the pandemic; is a great achievement for us.
What we learned
Implementing a YOLO model was a big learning experience for our entire team as it took a lot of time to debug and install it on our devices.
Our team had some experience in OpenCV earlier but while developing CoVoid, we rapidly expanded our knowledge on the topic and gained a lot of confidence in the field of computer vision.
Through the course of the project, our team learned how to collaborate and effectively divide work between members. We ensured that every member was working on a strong suite of theirs, while constantly helping each other out, learning from the same.
Overall, we learned the value of preparedness, and how spending time researching and learning more about the problem at hand can easily help brainstorm and find creative ways to expand on an idea.
What's next for CoVoid
CoVoid is in very early stages as it was built within just 24 hours. However, the level of accuracy is still quite good. The next step would be to use the data it collects over time, and use it to constantly keep re-training and improving it.
Alongside this, another addition could be facial recognition, where, in a small enough region, information like the person’s name can be found through the feed if they are not wearing masks. This will require it to be built at a much higher level of complexity, but in areas like schools, colleges, or even offices, it can be very helpful to ensure that the people respect the rules, and are held accountable if they commit an offense.
Furthermore, we could incorporate heat mapping into our application which would effectively show administrators and staff where more sanitization is required in indoor spaces. This could also help staff optimize their spaces to decrease hotspots and make the environment safer.