Inspired by Google's Android-exclusive app, Live Transcribe - an app to transcribe conversations to text for the deaf or hard of hearing.
My grandfather is hard of hearing, and even with hearing aids has trouble following conversations. When I learned of Live Transcribe, I was excited to see if it would help him. Unfortunately, Live Transcribe has a very limited range and cannot effectively be used in social situations (holding a phone in front of someone's face, then reading what they said is awkward, to say the least).
We developed Scrybe to address these shortcomings and bring even more functionality! Just say "Alexa, call Scrybe"
What it does
Scrybe innovatively uses Alexa in order to get voice input, because it has a greater range and can be placed in strategic locations (like the center of a table). The voice input is then transcribed to text via Twilio, pushed to Firebase Realtime Database, and displayed in our iOS app.
As we developed this technology, we realized it can have many more use cases:
- In lecture halls students often have trouble hearing their professors, even when microphones are used. Using Scrybe, students can connect to a password protected session to see what their professor is saying and help them take better notes.
- Scrybe transcriptions can be translated to other languages in real-time using Google's Cloud Translation API, helping international students keep up with fast-paced lectures.
How we built it
The frontend iOS app is built with Swift, utilizing
Firebase to manage user authentication and Alamofire to send REST requests to our backend. The app listens for updates to the
Firebase Realtime Database, and displays the transcription.
The backend is a serverless Node.js app running on
Cloud Functions, which receives the transcribed text directly from
Twilio using a REST webhook, does any translation using the
Google Translate API, and uploads the data to
Firebase Realtime Database.
Challenges we ran into
- Alexa Skills are not designed to keep operating for extended periods of time (to maintain user's privacy). Our solution to this issue was to utilize Alexa's phone-calling feature to place a phone call to a Twilio number to get voice input (as phone calls have no limitations on length).
- We used FirebaseUI to save time on developing the phone authentication, but ran into issues which were finally resolved by enabling an elusive iOS app setting.
- Due to some internet problems, we couldn't install some of our dependencies using a package manager - we had to do it manually!
Accomplishments that we're proud of
- It works!
- Successfully building a viable tool that can greatly aid the deaf and hard of hearing in a way that is not awkward and leverages inexpensive hardware that has become a common household item (Alexa)
- Since we used Twilio, users can substitute the Alexa for a phone and get the same functionality
- Securing the domain
What we learned
- We learned how to use the Firebase Realtime Database and create a Firebase Phone Authentication flow
- It's not a bug, it's a feature! ( ͡° ͜ʖ ͡°)
What's next for Scrybe
- Creating a webapp (and an Android counterpart) to enable more people to take advantage of this technology
- Support input languages other than English
- Improve speech-to-text results