I got to thinking one day. How can people who have impaired hearing join in the conversations happening around them? No one ever wants to be left out of a conversation, so I thought one way for them to understand what people around them said would be to use a web or native app to translate the conversation happening in a room.
When I found out about the 5G hackathon by AWS and Verizon, I thought this would be the perfect opportunity to leverage the ultra-low latency that 5G provides to provide an improved end-user experience while the app is being used.
What it does
The solution is a web app which when in operation, will convert the conversation around the end-user into text that the end-user can read.
How we built it
The initial plan was to make use of the AI services made available by AWS. Going through this route would have defeated the purpose of reducing latency as much as possible by calling services not available in AWS and Verizon's 5G Wavelength zones.
To solve this I searched online for an open-source speech recognition software that I could make use of. I came across Vosk, an open-source speech recognition software.
The speech recognition software was installed using docker on a t3.xlarge instance running ubuntu 20. The front-end application was made with angular and is served by a node.js instance. Nginx is used as a proxy server to serve the angular app and to send the audio to be transcribed from the browser to the speech recognition ML software. All these services are on the same server.
The server is hosted in the San Francisco Wavelength Zone, us-west-2-wl1-sfo-wlz-1.
Challenges we ran into
Found out that web workers work only with SSL. The web worker in this application handles sending the audio to the speech recognition software using WebSockets. Due to this, I had to install SSL certificates on the Nginx server
Wavelength Zones don't allow inbound requests from the internet, making it difficult to SSH into a server in the Zone. With some research, I found out that attaching an AWS Systems Manager Agent (SSM Agent) role to your instance grants you access without SSH keys or a bastion host
I initially faced an issue with the node.js instance going down every time I ended an SSM session. Sorted this out using PM2 to run the process in the background even after the session closes.
Inability to find a workaround to get the audio from my laptop as an audio input to the virtual devices on the nova platform
Accomplishments that we're proud of
Working with and implementing a speech recognition software with a web client that transcribes audio to text.
Working with AWS' new wavelength zones and gaining experience in deploying applications in those zones.
What we learned
Gaining more exposure to VPCs and learning about internet gateways, carrier gateways, and CIDR blocks. Basically improving my networking knowledge a lot.
How to set up SSL on NGINX using Let's Encrypt
Learning how to install and setup docker
How web workers only work with SSL
How to set up an Nginx reverse - proxy Server
That the 'getUserMedia' (used to get access to the phone/laptop microphone)WEB API only works on some specific browsers
Try It Links
https://coraltalks.com/ - (in a Verizon Wavelength Zone network)
https://mageweave.xyz/ - (from any location)
What's next for Transwave
Implementing the solution with an AR/VR twist - Imagine putting on AR headsets and people who have impaired hearing are able to read the conversations happening around them as subtitles. The subtitles would be visually mapped to the individuals speaking.