Serverless Audio Translation Service for CompassionNPO

architecture
translate lambda
convert text to audio
web UI

Inspiration

2018 AWS re:Invent NPO Hackathon for Good connected teams with worthy NP partners allowing us build a scalable framework to deliver a service that Compassion is looking to add to their program to enrich the Sponsor - Compassion child connection.

What it does

The solution provided delivers near realtime audio streaming at scale. Nearly serverless, the solution provides native AWS scalability using AWS Lambda, S3, API Gateway, AWS Transcribe, AWS Translate, and AWS Polly. Using a pay as you go model, the Compassion NPO would pay only a minimal amount per month based on usage.

Our audio translation relay service translates audio streams to/from languages spoken by the Sponsor and Compassion child. The spoken audio is captured and transcribed to text using AWS Lambda. The transcription artifact text triggers a second lambda function which screens it for profanity, if profanity is detected an error is passed back to the UI. If the Sponsor provided language passes the filter the artifact triggers a third Lambda to translate the text to the language of the Compassion Child. A final Lambda function encodes the text as an audio file in the Childs language and streams it to a session.

For this demonstration, we have only enabled English - Japanese translation.

How we built it

We used an Amazon EC2 AMI and installed Node.js on the server and served a REACT Node.js based web application to connect the Sponsor and Child. In the backend we utilized AWS Gateway and AWS Lambda to fire AWS Transcribe call to decode audio to text, filter that text using Lambda, translate the text to a chosen language using AWS Translate, encode the translated text to audio using AWS Polly and pass the audio to a web application session.

Our audio relay processing architecture was designed to be decoupled and serverless so the framework will scale and be inexpensive to deploy. The microservices can be improved by augmenting the profanity filter with other languages, implementing a sentiment analysis NLP framework across the artifacts to improve the content filtering and develop an SageMaker ML endpoint and perhaps evolve the web applications endpoint to other technologies such as mobile devices and Alexa enabled devices.

Challenges

Packaging large python libraries with dependencies to a Lambda function. Certificate handling for https connection proved to be significant barrier for a viable demo. Audio file encoding

Accomplishments that we are proud of

Extensive use of serverless technologies, extensible architecture for Compassion to augment and scale. REACT Node.js web application

What we learned

Wider exposure for the entire team using API Gateway, Lambda, Translate, Transcribe, Polly and packaging custom lambdas with dependencies.

What's next for Serverless Audio Translation Service for CompassionNPO

We are staging the capture and translation artifacts to support Compassion's long-term goals to improve the audio relay service to further screen the exchanges for inappropriate content. Sentiment analysis of the audio exchanges using Natural Language Processing machine learning framework will be implemented long-term to provide a safe a rewarding experience to foster hope and meaningful connections.