Conversion of the gloss video of gloss "person" into a path
Example of video data: ASL signing of the word glossed as "person".
Example of in-app demo. Text was edited in.
All of us believe in using computing for social good. Inclusiveness is one aspect of social good that we especially believe in and as such, wanted to build an application that widens the horizons of human interaction. While we are separated by barriers of language that we use to bond with our own community, we are united by English universally. However, the disabled community have their own exclusive communication methods. Blind use braille; Deaf and mute in America use American Sign Language (ASL). We wanted to find a way that we could comfortably interact with each other without being confined by a language be it tactile sign language used by deaf-blind people or even just us using English.
What it does
Sign2Speech scrapes videos of individuals signing from a ASL dictionary before using optical flow in order to analyse the movement to create a path of the hand and/or head movement. These data would then be trained to recognise sign language real time and convert those into text. We wanted to have the option of these text read aloud using firebase or azure. However, due to time constraints, we were unable to implement that function.
Challenges We Faced
Resources we wanted to use like the Video API within the Cognitive API provided by Microsoft was inaccessible. We also had issues with Firebase API since it was not entirely python friendly and was unable to configure how to use the video detection in it. As such, we had to resort to using optical flow through openCV after spending overnight attempting to figure the APIs out. Since our application is actually an android application that uses Java, we had to figure out how to integrate the python scripts into the Android Studio. It would clearly be an underestimation if I were to say that there was no 1 clear tutorial on how to do so and the process was clearly messy. Another issue we faced would be the inability to threshold the motion properly with the optical flow.
Accomplishments that we're proud of
Sign2Speech consists of various milestones. We were off to a good start by being able to scrape videos of individuals signing from a ASL dictionary. While this was certainly filled by various obstacles along the way such as figuring out how to have the various words scraped since scraping techniques using BeautifulSoup is usually targeted at specific examples instead of generic terms. Initially, the path of the hand and/or head movement was not clear since even the slightest movement of the head was reflected in graph of the average brightness derivative though we managed to solve that issue by introducing a decay constant.
What's next for Sign2Speech
There are various milestones that we would like to work on in the future which includes but not limited to: Optimally have all the videos from the ASL dictionary downloaded instead of using a small sample size
Tailor an algorithm for the glosses since we would know our requirements best
Use Firebase API or Azure for the read aloud feature of the text
Get in touch with the associations dealing with sign language so we can find more datasets if any and also test our
Prototype in a real life situation.