COVID-19 has changed the way we work now. Due to the pandemic nowadays, we work from home and conduct our meetings, presentations, and lectures virtually. Video conferencing applications have become our meeting place. And to conduct meetings virtually we use applications like, Zoom, Google Meet, Skype, etc. During these meetings, we like to write our comments on the screen directly which are called annotations. In the current scenario, we use our mouse to annotate on-screen. Using the mouse for annotations is not efficient, we make many mistakes and even the annotations are not proper. Thus, we propose a way to remove the usage of a mouse to annotate on-screen by using hand gestures.
What it does
Our application uses hand gestures to interact with the user. Annotations drawn using hand gestures help a user to better communicate his thoughts over a live video feed. Using different finger orientations, a user can enter different interactive modes. Typically, there is three finger orientation, first to start annotating where a user will start gesturing and it will capture and shown on the live video feed, second to erase the previous annotations, and third to change the color of the annotations.
How We built it
We have used mediapipe framework to identify the orientation of fingers. After identifying the finger orientation, we extract the Region Of Interest (ROI) involving hands. Then, we perform contour extraction and background segmentation to process the frames. These frames are fed to a CNN model which is pre-trained over a corpus of annotated gesture dataset, for the task of identifying gestures. Once the gestures are identified the respective actions are performed through the OpenCV module on a live video feed in near real-time. The fu=ianl processed frame is displayed as output for the user.
Challenges We ran into
One major challenge in implementing this application is operating at near real-time speeds. As our application involves processing through resource-intensive Computer Vision Techniques. Another challenged we faced was to make the application more responsive and a seamless experience. Identifying gestures in a live video feed is a big research problem because of its dynamic nature. In live video feed, fast hand movement of users is common, and capturing it accurately in a frame requires better quality hardware, which is a constraint.
Accomplishments that we are proud of
The biggest achievement from the hackathon project we have developed is the wide range of applicability of the idea of using hand gestures to annotate on-screen in various applications, one of which is video conferencing. Another considerable achievement for us is that we have developed a working application and achieved every goal we set in a very short period. Even though interactions between our team member was limited, we are quite impressed by the way we organized, planned, collaborated and developed the application.
What We learned
One of our aims of participating in this hackathon was to learn new technologies. Every one of our team members has taken at least a task which was new to them. Some of our learnings are how to use CV2 to draw/annotate over the video fees, mediapipe to identify the position of fingers, CNN model to identify the hand gesture of users.
What's next for AirDraw
Integration with video conferencing applications like, Zoom, Google Meet, Skype, etc. And Incorporating features like character recognition for annotation drawn, taking a screenshot of the annotations, and saving it into a document.