𝐂𝐨𝐯𝐢𝐝-𝟏𝟗 has turned all of our lives upsides down. It's 2021 & currently, we are standing in front of the second wave of the same. Because of that, the concept of contact-less methods is growing quite rapidly. But unfortunately, there are few scopes where we are still lacking the option, for example, ATM, Stores, Malls, Reservation Counters, Airports, etc. In terms of ATM, access to cash for many in society is remaining essential during the current lockdown around the globe. This is because, it requires a touch, be it either the note or the machine, which certainly is a major source of spreading Coronavirus in the public. This chain needs to be stopped before it's too late.
And that's how our project Aerowave jumps in! 😎
What it does
A further explanation of how each of these works can be found in the Engineering section below.
Although Aerowave is currently a POC, it currently supports all of the following features!
- Minimalistic UI
- Optimized for Low-spec systems.
- Customizable more than 10+ gestures for different operations.
- Easy to integrate into any web-based app or interface.
- Fully functioning components including buttons, containers, images, and more!
- Firebase for encrypted Data-logging.
- Keeping track of faces (Gender, Emotion, Age) using Facemesh-API
- Auto font-scaling.
- Customizable Teachable inspired Gesture Keyboard
- S2S interaction (removed due to CORS)
- One-click deploy to Heroku
How we built it
So, how everything works?
• We train our baseline 3D-CNN model till convergence and obtain the softmax output for each training sample.
• Then we trained smaller variants (referred to as medium and small) of our joint model obtained by reducing the number of feature maps (channels) in each layer by two and four times, respectively.
• To supervise the likelihood for each keypoint, we generate a sort of heatmap, wherein the mean of Gaussian peak is positioned at the ground-truth joint location as follows:
• Also, we adopt the mean square error as a loss function L as follows:
• Specifically, we save the trained model for each epoch in the training stage, and then in the testing stage, we average all the estimated 3D coordinates from the trained models. As we trained our model by 15 epochs, we used 15 models to obtain the final estimation. Skipping the surface, the final output of the gesture training looks like this:
• The comparisons were made using Cosine Similarity, a model which allows us to measure similarity between two non-zero vectors, where each vector consists of the positions of the person's body parts. After normalizing the vectors and calculating the cosine similarity, we would find the Euclidean Distance which takes into account a user's relative positioning in the rating calculations. These calculations are made in real-time to output a final confidence score, between 1 and 100, that is used to rate the output confidence.
Challenges we ran into
A lot! As I previously mentioned, the whole execution was done from scratch, even the advent of idea in our mind literally came during the opening ceremony from Sandipan. Initially, we were facing some issues while training the model on our system, including underfitting errors as we had to reduce the dataset parameters to optimize it so that it can run seamlessly on low-end devices. Also, it was a bit difficult for us to collaborate in a virtual setting but we somehow managed to finish the project on time.
What We Learned
Proper sleep is very important! :p Well, a lot of things, both summed up in technical & non-technical sides. Also not to mention, we enhanced our googling and Stackoverflow searching skill during the hackathon 😆
Turning a Proof of Concept idea into a working prototype is not an easy deal. It was a tad difficult for us to collaborate in a virtual setting but we are proud of finishing the project on time which seemed like a tough task initially but happily were also able to add most of the concepts that we envisioned for the app during ideation. Lastly, we think the impact our project could have is a significant accomplishment. Especially, trailing the current scenario of COVID19, this could really be a product that people find useful!
This project was especially an achievement for us because this time the experience was very different than what we have while building typical hackathon projects, which also includes heavy brainstorming, extensive research, and yes, hitting the final pin on the board.
What we learned
A lot of things, both summed up in technical & non-technical sides. Also not to mention, we enhanced our googling and Stackoverflow searching skill during the hackathon 😆
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map. https://arxiv.org/pdf/1711.07399v3.pdf
Face and hand tracking in the browser with MediaPipe and TensorFlow.js. https://blog.tensorflow.org/2020/03/face-and-hand-tracking-in-browser-with-mediapipe-and-tensorflowjs.html
Confronting COVID-19: An Analysis of Surface Contamination Risks. https://bit.ly/3mZ3vqo
Face landmarks detection with MediaPipe Facemesh. https://towardsdatascience.com/face-landmarks-detection-with-mediapipe-facemesh-555fa2e10b06
Community Transmission of SARS-CoV-2 by Surfaces: Risks and Risk Reduction Strategies. https://pubs.acs.org/doi/10.1021/acs.estlett.0c00966
What's next for Aerowave
A big challenge of the project was being able to detect the specific gesture and computing the action directly on client-side. Given more time, we'd be able to add an integrator that can take live input of a combination of gestures and recognize them. Also, we'd like to add more features to the app aesthetically intuitive and refactor the existing codebase to make it more suitable for Low-spec systems.