Inspiration: We aimed to meld our love of solving problems with our excitement for social impact. Prior to attending CalHacks, we connected with the Fellowship of the Physically Handicapped and learned about the hardships of people with hearing and speech impairments. Thus, we resolved to use machine learning and web development to implement a product that improves their lives.
What it does: Signum is our web-interface solution that aims to provide a communication platform for people with hearing and speech impairments. The platform takes a gesture input in the form of an image, processes it, and returns the corresponding ASL letter. The current prototype returns ASL letters for uploaded images, but does not yet provide a two-way video interface.
Challenges: Our initial plan was to create a platform that could integrate with Skype to act as a plug-in that could translate ASL for those in need of it. However, once we realized that Skype does not offer open APIs, we decided to modify our idea to taking in only a single livestream from a user’s webcam.
Another major challenge was not being able to use Google’s AutoML Video Intelligence API. This would have been ideal since the gesture inputs have been streamed from a webcam. However, the only training data we had available were still images, so we decided to utilise the Vision API for processing purposes.
Accomplishments: We used Google Cloud’s AutoML Vision API to train a custom ML model with a custom dataset from Kaggle containing around 18,000 images of ASL characters. We then integrated the API with Firebase to create the backend. This was linked to an HTML/CSS and Javascript frontend developed using Google’s App Engine.
What we learned: We learned how to utilize TensorFlow to run machine learning models in a Javascript front-end. Furthermore, we learned how to use Firebase for web application development and train a machine learning model using large data sets, which we acquired from Kaggle.
What’s next for Signum: Our next goal is to develop this software into one that supports real-time video streaming with minimal lag. We also want to sync this within video calling interfaces. Through this design, we aim to make long-distance communication much more inclusive. In the future, we envision additional use cases to implement, such as syncing our plug-in across YouTube videos, translating song lyrics on Spotify, and extending the product to Google’s Assistant and Apple’s Siri to recognise ASL gestures and execute a command, search request, or any other function, thereby solving for one purpose, but extending it to a multitude of platforms and making life more comfortable for people with hearing or speech impairments. With Signum, connections won’t be one way. Perhaps with machines, but never with people.
Built With
- css
- firebase
- html
- javascript
- vision-api
Log in or sign up for Devpost to join the conversation.