Attacked by severe cold, I was unable to get out of bed and made my way to Political Economy class, one of my favorite lecture courses. My absence on that day was even worsened by the fact that an amazing guest speaker was giving the class a lecture on what school don’t teach you about the business world, a topic that had long captured my attention since I first saw it. Even though asking a friend to record the whole speech for me would be a valid solution to the problem I was facing, it would be better if I could control a recording device remotely through voice command only, considering my health status. What’s more, if the recorded audio file could be automatically transcribed into text file or even summarized into an outline with a few sentences only, my compromised logical thinking ability under severe cold would be greatly assisted when I tried to understand the speech.
What it does
Tachyonic Summarizer is an Internet-based IoT device with remote control through Amazon Alexa. A user miles away from Tachyonic Summarizer is able to activate its voice recording function through our AWS server and Amazon echo with built-in Alexa personal assistant service. Recorded audio files will be concurrently transcribed into text files and saved on the Tachyonic Summarizer. Once transcribed, the user can issue voice command through Amazon echo to the device to retrieve the files, and Amazon echo will read out the most relevant outlines of the transcription.
How we built it
We have set up an Amazon Alexa skill and an AWS EC2 instance that handles requests from Amazon echo and Tachyonic Summarizer users. Amazon echo is able to communicate with the AWS server through specially formatted JSON data packages, and the AWS server (which runs a python flask service) sends commands to and receives text data from the Tachyonic Summarizer (which also runs a python flask service) using http and https protocol. Due to the lack of Intel Edison micro-computer, we are using a MacBook Pro to emulate the functioning of the supposed-to-be portable Tachyonic Summarizer, but the same can be easily achieved by simply replacing the MacBook with any ARM-based micro-computer.
Challenges we ran into
Challenge 1: A recorded long speech is not able to be transcribed into text file by making a single request to google speech to text API due to file size limit
Solution : We used audio recording buffer and multi-threading to process piecewise the audio segments so that transcription is done concurrently with the recording process
Challenge 2: Uncertain environmental noise that compromises an accurate speech to text conversion
Solution: We employed dynamic adjustment of audio energy ratio and energy threshold on Tachyonic Summarizer so that recordings are maximumly denoised before sent to Google speech to text service
Challenge 3: We do not have a SSL certificate which is required by Alexa skill setup
Solution: We ran ngrok on the AWS server to obtain a wildcard certificate to satisfy the requirement
Accomplishments that we're proud of
- Tachyonic Summarizer works unexpectedly well under fine Internet connection environment. Voice control through Amazon echo on Tachyonic Summarizer has unnoticeable latency.
- Users are able to specify the file names to save the transcription files through Amazon echo
- The text summarization works well in obtaining the most relevant content of a transcription even when the whole speech lasts hours.
What we learned
- How to set up Alexa Skill, intent templates and etc
- The downsides and strength of Google speech to text API
- The potential of Alexa and echo in the age of Internet of Things
What's next for Tachyonic Summarizer
- Being able to adjust level of summarization
- Being able to support languages other than English
- Further increase the portability to even smaller than micro-computers