We started down the path of creating an Android application that utilized Amazon Kinesis Video Streams to provide real-time audio translations of user's native spoken language into the connected party's native language. This resulted in the discovery of several limitations of current technology; including network and bandwidth constraints that would be significant issues in third-world countries.
Our final inspiration was based on the paradigm used by WhatsApp; where user's send short video or audio messages to each other. This paradigm is already popular in many of the countries covered by Compassion's host organizations.
What it does
Our final solution receives text and video SMS/MMS messages from a user (either Sponsor or Sponsored child) to an anonymized phone number (which can be shared in a many-to-many relationship), translates these messages between the user and target's native languages, and then sends the target a series of messages with content based on the original input. For cases where a video is sent, then two messages are received by the target; the first is the original video, the second is a Polly voice audio file with a translated version of the audio from the video. In the case
How we built it
We used Twilio for the input and output endpoints for text messages, which allows for anonymous sourcing for both parties. For the processing, Twilio calls the AWS API Gateway, which in turn calls a Lambda function to upload the SMS/MMS contents to an S3 bucket. A Lambda triggers on upload to the S3 bucket, resulting in the AWS Transcribe service being called. This causes a new text document to be uploaded into a staging bucket, which is in turn translated utilizing AWS Translate. The translated text is insert into a DynamoDB table, flagged for moderation if required based on keyword analysis. The translated text is then sent to AWS Polly for creation of the associated audio files. Lambda is then used to tell Twilio to send an SMS/MMS to the target user with the associated audio files and original video.
Challenges we ran into
With the original approach, latency...latency and more latency.
With the updated approach, the main issues were with validating the input and output formats were compatible with Twilio's MMS support.
Accomplishments that we are proud of
What we learned
Twilio's API, Serverless NodeJS, Polly, Translate, Kinesis Video Streams, Transcribe
What's next for Project Mind Stone
- Extend moderation application to include mapping of sponsor child to a given phone number ** Requires data from Compassion; need max count of sponsored children by sponsor
- Extend moderation application to ‘login child’ at host location to particular device
- Queue offline message until child is assigned temporary device
- Screen video messages for explicit content both directions