Inspiration was the challenge published by T-Systems on hackkosice 2019
What it does
Grabs the audio stream from computer microphone and with use of Google Cloud Platform services, it transcribes the captured audio to text and then translate the text to target language and send it to simple frontend.
Fasttext embeddings are used to enhance the transcription - if there is a transcription with low confidence, it will try to replace it with word, which has the highest probability of occurence in given context.
How we built it
Using Google Cloud Platform services - this integration, along with word embeddings was handled by Marko Sahan, using platform and his personal Google Cloud Platform account (there is a free 300 dollar trial for new users).
Integration of translation into simple website (shows as a stream of text), with basic integration to Webex was done by Jaroslav Loebl.
Challenges we ran into
Jaroslav killed a lots of hours by imlementing the non-blocking streaming of translation results to frontend. It was mainly caused by low knowledge of asyncio.
Marko for the first time used the Google Cloud Platform and working under the time pressure was also strong experience.
Accomplishments that we're proud of
Working PoC, which transcribe and translate captured audio and show it to user.
What we learned
Jaroslav learned a lot about asyncio and parallel processes in Python and Marko got hands dirty with GCP
What's next for Tapir (speech-to-text-to-translate)
An actual sending of translated text to other webex users, as subtitles.