Inspiration
Our inspiration for our project was the variety of skill sets each team member had to offer. Jorge is a computer engineering major that is good with hardware components. Andrei as of recently has been learning more about data science and knows how to handle data. Leo and Anthony are good with software like web applications.
These various skill sets inspired us to make an application that uses both software and hardware, which made us start looking at the available hardware components we could checkout for the hackathon. When we found the Leap Motion Controller device we all got excited and instantly knew we had to incorporate it into our project to make an impactful project. Computer vision was a concept that had come up various times in our conversations and went hand in hand with the leap motion device. Another focus of ours was to make a project that would be beneficial and could improve overall society. All these factors lead us to brainstorm the idea of Visual Voice by combining ASL, machine learning, computer vision, inclusivity, communication, and learning.
What it does
Visual Voice is an ASL e-learning and online communication tool. Once signed in, we have a training section on the website that prompts users with a randomly selected alphabet letter which users will need to match the sign of that letter. This training section helps create a more inclusive world as non-ASL users can learn how to better understand ASL users. We also have an online communication platform where ASL users signs get turned into text and non-ASL users' speech gets turned into text. This helps ASL and non-ASL users communicate better and creates a more inclusive environment. By using data generated from an IR device instead of a webcam, we aim to create an ML model with predictions more accurate than what currently exist, giving users the best experience.
How we built it
We built the machine learning model by using a Leap Motion Controller (LPT) to generate and collect training data on labeled ASL signs data The hand-tracking data from LPT was parsed into a CSV file to be fed to the ML model. Additionally, we found a data set on RoboFlow that had labeled images of ASL signs that we also used to train the model. We trained the model using Sklearn and a Random Forest Classification model. We built the web application by using react and JavaScript. The web application has react bootstrap to allows us to use some pre-made components to speed up development due to time constraints of the hackathon.
Challenges we ran into
The biggest challenge we faced was not being able to turn our asl_model.keras file into a tensorflow.js file to be able to use the model in the React JavaScript application. This impaired our ability to implement the machine learning feature into the we application. We also were unable to relate the hand-tracking data from the Leap Motion Controller to image data from a webcam to determine ASL symbols.
Accomplishments that we're proud of
Learning how to use the Leap Motion Controller (LPT) and successfully parsing the hand-tracking data into a CSV file, creating a model with 100% accuracy when tested on additional data from the LPT. Another achievement for us was being able to train an ML model with image datasets from RoboFlow, allowing an accurate translation from ASL to text. We are also proud of the web application we built as it has a built-in web cam to help users learn ASL and an online communication feature with built-in webcam feed too.
What we learned
We learned how to train a machine learning model based on supervised learning and the Random Forest Classification model. We also learned how to use special react components like react-webcam and react socketio to help with the webcam feed and online webcam call, which greatly improved the functionality of our web app. Additionally, we learned how to interpret data from the Leap Motion Controller and parse it into a CSV file.
What's next for Visual Voice
Currently, Visual Voice is only able to support ASL letter-by-letter recognition, the next step is to use the Leap Motion Controller's frame interpolation features to be able to detect motion within frames. This will allow us to train our ML model to go beyond learning ASL letters, by learning ASL words and phrases We also want to connect our ML Model to our web application so that we can fully achieve Visual Voice's intended functionality Additionally, we want to be able to relate the training data from the Leap Motion Controller to image data from a webcam. This would involve figuring out how to "translate" the two datasets, and allow for a stronger training and overall, more accuracy for out ML model
Built With
- auth0
- c
- javascript
- leap-motion
- machine-learning
- opencv
- pandas
- pickle
- python
- react
- sklearn
- tensorflow
Log in or sign up for Devpost to join the conversation.