NoteDex

Inspiration

The Student Disability Center on campus often requests note takers to assist students with disabilities, but often, these positions go unfilled. This is detrimental to student with learning disabilities such as deafness or attention-deficit. We saw this as an opportunity to apply natural language processing, text sentiment analysis, OCR, and state of the art speech-to-text software to help these students.

What it does

NoteDex can take in three types of input - audio, video, and images (handwritten or typed). It outputs a comprehensive text based on the input as well as a summary of the content.

How we built it

We leveraged Google cloud services, notably Google speech-to-text and OCR, to process our input data. To optimize for large audio/video files, we implemented multithreading in python to speed up the time from 55 minutes to under 30 seconds. To interface these features, we used the Flask framework along with HTML templates. Finally, we deployed using Google's Compute Engine. Afterwards, we ran a customized TextRank summarization algorithm on that outputted text using natural language processing principles, where we selected the most important sentences based on a combination of TextRank weights and Rapid Automatic Keyword Extraction (RAKE) outputs. Finally, we included Twilio to text ourselves analytics regarding file size and submission to monitor usage and pricing information. To increase accessibility for international students, we have also included the ability to view the outputted transcriptions in a different language using Google translate.

Challenges we ran into

One major obstacle we ran into with Google speech-to-text was with the maximum length of video and audio features we could process. We were only allowed to process one minute at a time using synchronous recognition and asynchronous recognition required us to upload already large files to the cloud, reducing efficiency. We solved this by splitting the file into 100 threads, which all ran independently, vastly decreasing run-time.

Additionally, natural language summarization was a much more difficult problem than we had thought, and we realized that we probably could not obtain enough data to run a specific deep learning algorithm. We tried two summarization models - one using GloVe (Global Vectors for Word Representation) and the others using a custom RAKE/bag-of-words representation, which we ended up using for efficiency reasons.

Finally, deployment was also a major challenge. We didn't have much experience with deployment, and we started from the ground up. It was a substantial challenge to handle billing issues and installing packages on a different environment.

Also, half of the team members were new to using Github, and code management was a challenge at times.

Accomplishments that we're proud of

This was the first substantial hackathon for half of the team, and it was awesome to actually create a running product. We had just learning multithreading a week ago in a computer science class through a more theoretical lens, and it was exciting to use it in a real world application. Finally, it was cool to learn to use APIs like Twilio's and Google's cloud computing services!

What we learned

We learned to use git and resolve annoying merge conflicts. Also, we hadn't used GCP or API's in the past, so we had to learn how to use these.

What's next for NoteDex

We want to possibly implement a flash cards feature that can help the student retain information. Additionally, we also want to add the capability to merge note sets together (crowd sourced note taking), and finally, expand to a university!!

Built With

Submitted to

HackDavis 2019

Created by

I worked on summarizing input text using NLTK and a custom TextRank algorithm in Python, that ranked the most important words first based on weights and RAKE outputs.

Tiger Sun
I functioned as project manager for our team.I helped design the high level architecture of the application and assisted my team with individual feature development. I also deployed that application to GCP Compute Engine.

Jordan Carlile
I worked on integrating Google Cloud Speech to Text API to support audio/video files, implementing multithreading to optimize the time for long audio transcriptions (55 mins to under 30 seconds), and adding Twilio API support.

Manav Aggarwal
Roger Fleenor