People who are deaf and blind face what most of us would call a nightmare. We initially decided to build this when we came across Haben Grima, a deafblind Harvard graduate who needed an interpreter with her the entire time on National TV and a solution that proposed morse code to solve this. For more than seventy-five years the National Federation of the Blind has worked to transform the dreams of hundreds of thousands of blind and deaf people into reality by using technology like the text to morse translation, text to braille translation, refreshable braille display, etc. But the level of freedom with what a deaf and blind person can do is very limited. So with the current abundance of available technology, in order to broaden that horizon and put them on equal grounds with any other person. And so, Vibraille was born: A revamped approach to braille
What it does
Vibraille has three main functions.
It converts speech to text in real-time using a GCP speech synthesizer and uses the localized haptic sensors on any phone to vibrate the corresponding braille translation accurately.
It can also parse any pdf into text and convert it into our digitized haptic braille. Now, they are able to read any book, any website, any text source available at ease
Finally, we used ResNets and Convolutional neural networks to parse youtube videos in order to produce braille that is composed of both the transcript and the description of what is going on in the video. So the people who are deaf and blind can now watch youtube videos as well!
How we built it
The front end was built on flutter with the haptic package that interacts with the localized haptic sensors of the device in order to produce accurate vibrations. For the backend, we used google’s app engine in order to deploy the speech-to-text model and the OCR engine. We used ResNet models and CNNs that are prebuilt and deployed by Google through GCP with multiple transformer layers in order to build the model that parses youtube videos. All the models were built on google colab and exported as TensorFlow models. The speech to text model that uses google’s Rest API was hosted as an endpoint in firebase while the CNN model was applied in a cloud backend function that uses a platform streamer in order to return text as it parses the keyframes and the video transcript. The merging is done on devices based on the timestamps.
Challenges we ran into
Hosting the TensorFlow model on the app engine while making a secure connection to process on our input. Finding the right library to pinpoint our haptic feedback. Tuning the correct hyperparameter in order to produce a meaningful video description. Merging the transcript with the video description at the right time. Setting the GCP ML model to parse speech to text in real-time. Manually generating mapping for text to constructed braille until realizing there was a package for it.
Accomplishments that we're proud of
First, getting all three modes to work. We initially only wanted to incorporate two modes, but we realized that if we never slept, we can go for the third one as well, so getting the youtube video to braille to work was a huge accomplishment from our perspective. Making something that uses current technology to solve existing problems that use old technology. Being able to provide a stepping block so the people who are deaf and blind can stand on equal grounds feels great.
What we learned
We learned to foster the mentality that blindness and deafness is not the characteristic that defines you or your future. You can live the life you want; blindness is not what holds you back. We want to play our part in leveling the playing field.
What's next for Vibraille
We want to better the algorithm that parses youtube videos and includes new features like parsing music and art in some form and making it a platform where you can also interactively learn. For example, Quizlets in Braille, leetcode in braille. These are all plans we plan on pursuing in order to make an actual impact.