When I was studying in Australia as an international student, I saw many of my friends struggle with their assignments and exams. Many international students face serious learning difficulties in the classroom and even though they are working very hard, the language barrier sometimes holds them back. I decided to use my personal experience as an inspiration to build this application.
What it does
The purpose of my application is to help both English-speaking and international students especially in the first one or two years of studying. The students can upload audio or video files of their lectures. By transcribing the lectures, they will be able to focus on the topic of the discussion rather than note-taking. Translating the transcription and converting it to text may also help students understand the concepts better until they improve their English skills to a desired level and it accommodates various learning styles. They can also extract the text from the photos of lecture slides to make their learning even more efficient.
The application may also be useful for people from non-English speaking countries who often attend international conferences. The recordings of the talks are often available online. Users can easily convert them to one of many supported languages and then incorporate their new knowledge into their work.
How I built it
I started with setting up IAM roles and Cognito authentication. After that, I moved onto uploading audio files to an S3 bucket. Then, I set up the trigger in S3 and created a Lambda function to send the files to Amazon Transcribe, format the response, save it in S3, show the user a list of files and give them an option to download them to their local file system.
Now I already had the transcribed files saved, so I listed the contents of an S3 bucket to the user. I set up API Gateway to send the file reference and target language data to another Lambda function. Lambda sent the files to Amazon Transcribe, processed the response and saved the text file to S3. User is again able to download the file.
Once I had translated files, I could repeat previous steps - show the user a list of files and let them select one to convert to speech, select voice Id, set up another API Gateway endpoint, send the data to Lambda and then to Amazon Polly. StartSpeechSynthesisTask can convert long texts to speech which is ideal for lectures, and the files are saved directly to S3. The user than has the ability to download these mp3 files.
The last bit was similar to the first one. I started with uploading images to an S3 bucket. The upload triggers a Lambda function which sends the image to Amazon Rekognition in order to detect text. The response is processed in the Lambda function and the user receives the extracted text.
Challenges I ran into
I found it a bit difficult to find the documentation for Amazon Polly and also Elastic Search with examples in python.
Accomplishments that I'm proud of
I've build the application myself in about three weeks which makes me really proud as this is my first AWS project.
What I learned
I've learnt how to create S3 buckets, configure API Gateway and Cognito, trigger Lambda functions, create user roles and use AWS AI services. I've also learnt how to build a multi-page application in React.js.
What's next for Knowledge Manager
After the user files (both audio/video and image) are transcribed, the text could be sent to Amazon Comprehend to extract key phrases. Elastic Search could be used to search the files. Option to upload text files and extract key phrases could also be added. Then, users could create their own knowledge database in any domain, which would allow them to search files regardless of their format. With the transcribed files, users can create their own multi-format Google capable of searching the content of text, video, audio and image at the same time and receive the most relevant results.