As a beginner team, we assembled our knowledge and skills at best to create a viable CNN model that gives real-time detection of signs/gestures for potential target of interest. At the beginning, we were challenged with lack of hands-on experience in limited amount of time. However, we quickly learned to leverage AI agents in both learning and building potential prototypes for our project. Throughout the journey, we continued to improve and innovate our model through continuous efforts of ideation and cooperations. Despite various realistic challenges, we are happy at the end to be able to deliver a humble yet complete integration of CNN model with deployable API. To effectively detect the position of each fingers, we were inspired by the how EEG filters noise of brainwaves. We calculated the relative 3-dimention position of the 20 points collected by mediaflow to the point on the user's wrist and output it into a .csv file as the data we use to train our CNN. When applying our initial CNN model, we encountered the problem that the detection confidence of our model degraded linearly with the distance our hand is to the webcam. To address this issue, we came up with the algorithm which deploys our CNN twice. It firstly checks for the overall position of the user's palm is , then through dynamic cropping of the webcam footage, it keeps the gesture to be detected zoomed in and in the middle (as shown in hand_to_text.py). To reliably deploy our technology, we integrated the predictive model to a hypothetical client and server network. Furthermore, we attempted to simplify the its usage by prescribing a instructive library for potentially useful functions.

Built With

Share this project:

Updates