Inspiration
Our initial inspiration for this project was to facilitate notetaking for those who have a hard time taking notes. This sounded like a natural language processing problem, since automated text summarization had already been done. However, we couldn't find much information about combining Speech-to-Text and Text Summarization, and we found it meaningful to pursue this project in hopes of discovering how these two ideas can be combined effectively.
What it does
What our program does is take in an audio file, containing a speech or lecture, and generates a summary based off of the content of the audio file. Additionally, a list of key words or phrases summarizing the content is also given, as well as the option to translate to a different language.
How we built it
After reading in an audio file, our program uses the Google Cloud Speech-to-Text API in order to convert the lecture audio to a plain text document. This file is then given to the text summarization part of our program. This is handled by the gensim Python library, which uses a variation of the TextRank algorithm in order to obtain and rank the most significant keywords within the corpus. The result is a string containing a summary of the text file that we passed in. On top of this, we discovered that gensim also has keyword extraction, so we also decided to display these along with the summary. Feeling adventurous, we also decided that translation into other languages would be useful for people who aren't as strong in English, but are fluent in another. Hence, we used the translate python library, which takes in English text and converts it to the language of the observer's choice.
Challenges we ran into
We ran into a lot of challenges. We knew that the Google Cloud Platform had a lot of the functionality that we wanted, but none of us were familiar with the API and we had several issues setting up the Google Cloud SDK. Another thing was that we spent a lot of time on trying to figure out how to train our own model in order to summarize text. Most of this was in vain because resources like the Google Natural Language API and the Python library, gensim, include already trained models or much easier / more efficient ways to achieve our goals within the timespan of the event.
Accomplishments that we're proud of
Despite the challenges, we are proud that we were able to push past that and eventually begin our journey into the world of machine learning and artificial intelligence. Every new thing that we discovered as a group fascinated us all and none of us regret the hours spent on this project. For some of us, it was our first hackathon, and we rose up to the challenge of both learning about artificial intelligence and working within such a short amount of time.
What we learned
We definitely learned a lot working on this project. Most of us came in with little knowledge of artificial intelligence, but now we leave the event with an understanding of different methods in how to train a model (or use a pre-trained model, in this case). Along the way to figuring out how we were going to implement this program, we had to learn from research papers and documentation for SciPy and NumPy libraries. The AI workshops, given by HackGT staff, on semantic search analysis and training a neural network were insightful and gave us a lot of good ideas on things we considered or plan on adding to our project.
What's next for Sizo
There is so much that we can do for this project that we weren't able to accomplish within this weekend. For example, we wanted to implement some form of text normalization between the Speech-to-Text and Text Summarization. That is, something that allows us to clean up the intermediate text file so that our summarization algorithm works more effectively. This would do things like replace numbers with their word equivalent ('one' in the place of '1'), remove verbal "ums and ahs", or insert punctuation where the Speech-to-Text failed to detect natural breaks (or even whole words). Another thing we couldn't get to was Topic Modeling. We wanted to extract the main topics from the text, and insert it into the summary in such a way that we effectively automatically generated notes from a class. A possible implementation was to use Word2Vec on individual sentences and see which ones had the smallest cosine distance (i.e. was most similar in meaning) to the rest of the document, or at least whole other sections. Regardless of what we work on, there are many ways that we can bring Sizo even further beyond what we managed to bring it to this weekend.
Log in or sign up for Devpost to join the conversation.