While the members of UbiVoice were brainstorming for potential ideas during the start of this hackathon, we came upon an interesting dilemma. One of our members is an international student from Russia, and the videos she had been looking up were all in Russian. While showing us the video she thought was cool, she quickly realized that the subtitles were not accurate or aligned with the video. She frantically tried to translate the whole video, but it quickly became a painstaking, arduous task. Her actions stirred up a whole new conversation on breaking language barriers. What if we could make a streaming translator for any media content that could speak to you in a language of your choosing? What if you were in a conference and you could understand your co-workers from the other side of the world as they speak regardless of their language? What if you could provide dubs to any movie/tv show/YouTube video in every language?
Thus, we introduce Ubi.
What it does
For those that have watched Doctor Who: Ubi imitates the TARDIS.
The TARDIS is Doctor Who's fictional time machine and spacecraft and it contains a telepathic field that gets inside your brain and translates everything into a language you understand.
Ubi is a transcriber, translator, and dubber. She uses automatic speech recognition to send text files to her built in translator. She will then read aloud the speech translation based on the language the user picks to hear. She does this seamlessly in real time with speech comprehending technology that only improves over time through machine learning. She can provide your movies and shows with dubs as the actor speaks out loud. She can be your "telepathic field" that translates everything in a language you understand, your TARDIS.
How we built it
To bring this project to life, we used a combination of 3 different Amazon Web Service and hardware:
Amazon Web Services: Web services that use Machine learning to process language AWS Transcribe: An automatic speech recognition (ASR) service AWS Translate: A neural machine translation service that uses deep learning models to deliver accurate and natural sounding translation AWS Polly: A service that turns text into lifelike speech
Hardware hacks: Bluetooth technology and loopback audio
Optional: We also tested our application with and without a server. We demoed without a server to show further accessibility.
Challenges we ran into
As we developed UbiVoice there were a slew of interesting challenges that we had to overcome. One of the first challenges that we were tasked with was to design an efficient application architecture that would support real-time translation. Following this, we realized that the technology we were using was extremely new (days new even) and had not yet been properly documented. This led to us having to dig deep into the documentation of these new technologies. However, our challenges were not only limited to software. We found a very interesting challenge when looking for ways to separate the audio channels in order to create a serverless real-time translation experience. Overall, each of these challenges helped us significantly grow in a various array of technical skills.
Accomplishments that we're proud of
We are most proud that we were able to compile and successfully deploy technology that had never been done before. The minimal documentation we had to go off on were months to days new, but were all completely outdated by the time the hackathon started. We had to refactor everything and create our own algorithms and strategies in porting one service to another in real time without significant delays. The predictive text was at first very inaccurate and did not stream, but our final solution solved most if not all of these problems.
What we learned
During 36 hours spent shoulder to shoulder in a small college apartment, we learnt a lot about teamwork and tasks delegation. At times, we would be even caught up in heated debates about implementation and application, ethical choices in coding or proper marketing. As all four of us have different professional and educational experiences, it was extremely interesting to come up with a project, its uses and impact.
Besides, all of us learnt a lot about current technology in language and translation field. It was interesting to test already existing products and find ways to improve them. For example, we wanted our service to be able to assist both 500 Fortune businesses and humanitarian initiatives (rescue, health education, etc.). Finding how coding and our bigger vision coincide was a true learning adventure.
What's next for UbiVoice
We recognize that our solution has an incredibly high potential to grow. In a global world, real time language translation application will remain in high demand. Thus, we believe that UbiVoice can be integrated into robot hardware as well as become a fully developed server host and move beyond aws. There is definitely room to work on scalability and faster translation speed to mimic human interpreters.