We both are part of Make to Innovate, a class through the aerospace department at Iowa State, and we both work extensively in the M2I lab. In this lab, there is an area that is taped off on the ground and safety glasses are required on those working in that area. We thought it would be a great idea to automate that detection.
What it does
First, our script use Tensorflow to recognize faces in frames of a webcam feed. Then, using stereo vision and OpenCV, it finds the distance to that face to check if it is in a designated range. Finally, it uses its trained Tensorflow model to figure out if the person is wearing the required eye protection.
How we built it
We imported pretrained Tensorflow face recognition models, trained Tensorflow models to distinguish whether one was wearing safety glasses, calibrated and configured stereo vision camera to extract depth data, and integrated all components into one script to detect if, given a person's face could be seen and is in "the safe zone", identify whether or not that person had applied the necessary eye protection.
Challenges we ran into
The biggest challenge we encountered was in the reduction of computation that was performed for each frame of video. Tensorflow instantiates multiple sessions on each running of the code which results in a high time for safety glass recognition, about 1.5-3 seconds. This is about 10x the rest of the code combined. This results in low frames per second, about 3 fps.
Accomplishments that we're proud of
Learning to use Tensorflow to recognize objects. Learning how to make a stereo vision setup to determine distance. We got everything to work even though it has a slow fps.
What we learned
How to work with Tensorflow API including retraining models, using the model to classify objects, and facial recognition. How to build a stereo vision system to determine distance. All of this was accomplished with 2 Logitech C270 webcams bought at $20 each.
What's next for Safety First
optimization, we would like to figure out why tensorflow instantiates multiple instances for every object recognition, this will reduce the time to recognize objects on every loop resulting in a higher fps. We would also like to modularize the code by splitting the calibration and depth mapping into their own functions. We would also like to create documentation and posting our tutorial online.