Inspiration

I got to thinking one day. How can people who have impaired hearing join in the conversations happening around them? No one ever wants to be left out of a conversation, so I thought one way for them to understand what people around them said would be to use a web or native app to translate the conversation happening in a room.

When I found out about the 5G hackathon by AWS and Verizon, I thought this would be the perfect opportunity to leverage the ultra-low latency that 5G provides to provide an improved end-user experience while the app is being used.

What it does

The solution is a web app which when in operation, will convert the conversation around the end-user into text that the end-user can read.

How we built it

The initial plan was to make use of the AI services made available by AWS. Going through this route would have defeated the purpose of reducing latency as much as possible by calling services not available in AWS and Verizon's 5G Wavelength zones.

To solve this I searched online for an open-source speech recognition software that I could make use of. I came across Vosk, an open-source speech recognition software.

The speech recognition software was installed using docker on a t3.xlarge instance running ubuntu 20. The front-end application was made with angular and is served by a node.js instance. Nginx is used as a proxy server to serve the angular app and to send the audio to be transcribed from the browser to the speech recognition ML software. All these services are on the same server.

The server is hosted in the San Francisco Wavelength Zone, us-west-2-wl1-sfo-wlz-1.

Challenges we ran into

  1. Found out that web workers work only with SSL. The web worker in this application handles sending the audio to the speech recognition software using WebSockets. Due to this, I had to install SSL certificates on the Nginx server

  2. Wavelength Zones don't allow inbound requests from the internet, making it difficult to SSH into a server in the Zone. With some research, I found out that attaching an AWS Systems Manager Agent (SSM Agent) role to your instance grants you access without SSH keys or a bastion host

  3. I initially faced an issue with the node.js instance going down every time I ended an SSM session. Sorted this out using PM2 to run the process in the background even after the session closes.

  4. Inability to find a workaround to get the audio from my laptop as an audio input to the virtual devices on the nova platform

Accomplishments that we're proud of

  1. Working with and implementing a speech recognition software with a web client that transcribes audio to text.

  2. Working with AWS' new wavelength zones and gaining experience in deploying applications in those zones.

What we learned

  1. Gaining more exposure to VPCs and learning about internet gateways, carrier gateways, and CIDR blocks. Basically improving my networking knowledge a lot.

  2. How to set up SSL on NGINX using Let's Encrypt

  3. Learning how to install and setup docker

  4. How web workers only work with SSL

  5. How to set up an Nginx reverse - proxy Server

  6. That the 'getUserMedia' (used to get access to the phone/laptop microphone)WEB API only works on some specific browsers

Try It Links

  1. https://coraltalks.com/ - (in a Verizon Wavelength Zone network)

  2. https://mageweave.xyz/ - (from any location)

  3. Github Repo

What's next for Transwave

Implementing the solution with an AR/VR twist - Imagine putting on AR headsets and people who have impaired hearing are able to read the conversations happening around them as subtitles. The subtitles would be visually mapped to the individuals speaking.

Share this project:

Updates