My team and I wanted to implement some form of computer vision in our project. We were intrigued by the idea of a webpage that allows users to interact with their computers using only hand gestures. No buttons needed.
What it does
Our webpage presents a menu broken up by course. First, five appetizers drinks are shown, with each option assigned a number between 1 and 5. If the user wants the appetizers with label 1, they can hold up one finger. If they want the appetizers with label 5, they can hold up five fingers. If the user doesn't want an appetizers, they can hold up a fist.
A timer counts down from 5 seconds and takes a screenshot of a webcam when it reaches 0. That screenshot is sent to a Google Cloud AutoML model to determine how many fingers are being held up. Once an appetizer is ordered, the user is shown drinks, main plates, and desserts. After the entire meal is ordered, the user is shown a detailed receipt containing all the items they ordered, and a copy of the receipt is sent to the restaurant's email.
How we built it
Our project had two integral parts: a computer vision program to count the number of lifted fingers in an image and a webpage to display a menu, take photos, and send an email with the user's order. We used Google Cloud's Vision API to train a model on a dataset of hand images. To reduce noise in the dataset, we used opencv to highlight the edges of our hand black and turn every other pixel white. Our model took around 6 hours to train, but when finished it was able to determine the number of fingers in an image with around 99% accuracy.
We built our webpage using Flask. It displays the menu to the user and take a photo every 5 seconds using a webcam. We used template engines to pass variables into our HTML files. Additionally, we used Notivize to email a receipt of the user's order.
Challenges we ran into
None of us had never used the Google Cloud platform before, and it took significant trial and error to successfully setup. Google Cloud Authentication and data labeling was initially confusing Additionally, all our training data had a white background, so our model is easily confused by noise.
Accomplishments that we're proud of
Despite having no experience with machine learning, our team was able to create a model that classified hand gestures extremely accurately under ideal conditions. It's really satisfying to see an order go through completely correctly!
What we learned
What's next for GestureAI
In order to prevent wrong orders, we could ask users to verify whether or not our model classified their hand gesture correctly before submitting their order. It would also be interesting to incorporate a mobile payment service, such as Venmo's API.