Inspiration
This semester, we noticed that professors have been requesting for students to volunteer taking notes for students with disabilities, but were met with an absence of response from students. After further investigation, we learned that this was because students lacked motivation, felt like their notes were incomplete, or felt like their notes were too messy to be followed through. To address all these problems, we determined that the solution would be to automate the process of note taking by recording all that is said by the professor and annotating those notes with the slideshow that the professor uses using Natural Language Processing (NLP), all condensed into a PDF file. Any student can easily gather any day’s notes as soon as the teacher uploads both the PowerPoint and recorded audio files immediately after class. An additional benefit on top of increasing educational accessibility is that we provide students a condensed version of the slides and everything said by the teacher, saving them time to review content instead of watching long lectures. Finally, we wanted to take a step up from last semester's Hack the Future Hackathon and create something that grows our skills in React for the frontend and in machine learning on the backend with NLP.
What it does
We take a PowerPoint file and a .WAV audio file. We use Google Speech to Text API to convert the .WAV audio file into a paragraph of text, that we clean and parse through to identify keywords. We scrape the PowerPoint for the different lines of text and images to get keywords for these parts. Afterwards, we train a pre-trained NLP model on the new keywords that we have collected so we can assign vectors with weights to represent meanings to each word. Based on the dot products of either the words between a slide show line and a sentence from the audio file, or between two sentences from the audio file, we assign each sentence in the audio file to a line in the slideshow. We then create the PDF by first putting the title of the slide, then lines from that slide, and then relevant sentences to that line from the .WAV file transcription . The user then waits for a few seconds before getting a downloadable PDF of notes that combines general information from the slides with the in-depth knowledge of the professor during lecture. We even include images and format everything nicely to ensure ease of understanding.
How we built it
The frontend was built using React JS and Google Firebase as the database. The backend was all coded in Python. We used Google Speech to Text API to get the transcript of the audio file. The NLTK library was used for cleaning texts to remove filler words. The PPTX library was used for scraping slides for text and image descriptions. We used Gensim’s Word2Vec Model pre-trained on Gensim’s common_texts for our NLP model that assigned vectors to words. Spacy was used for increasing the size of the dataset and to lemmatize words from the slideshow and transcript. Finally, we used ReportLab for writing the annotations to a PDF file. We linked the frontend and backend with Flask API.
Challenges we ran into
Uploading files both to the website and Firebase took a lot of debugging as we never really used the file system in past projects. Coming up with a clean but elegant design was also difficult, so we spent many hours debating where to place objects and what the color scheme should be. Handling file requests that we had to wait on from Firebase also took a lot of debugging because we hadn’t really worked with async functions or “await” before. On the backend, ideating solutions took lots of drawing and debates because of how extensive mappings would be between different sentences and representations of sentences. Additionally, there was a large room for improvement in our model, so we spent a lot of time using a variety of datasets for pre-training the model, optimizing how we train the model on new data, and using NLP concepts to increasingly teach our model the relations between sentences we provided. Finally, getting used to libraries’ syntaxes was frustrating at times and took lots of debugging and reading of documentation.
Accomplishments that we're proud of
We came into this hackathon wanting to strive for more since the last hackathon, and I think we really exceeded our expectations! We learned about the intricacies and theory of NLP and connecting it to frontend and backend technologies. Additionally, compared to just making a hacky solution like last Hackathon, we have a fully functioning product that can work with any audio and slides! We wanted to make a product that helped the community around us, and we think NoteNinja will be greatly beneficial to all students and professors regardless of major or university. As a matter of fact, NoteNinja can be used anywhere from simple elementary school presentations to corporate meetings at Fortune 500 companies - the possibilities are endless!
What we learned
As mentioned before, we learned a lot about NLP and the current models. Furthermore, we learned about design principles and creating a minimalist user experience that is accessible for anyone.
What's next for NoteNinja
We hope to expand the NLP model to understand equations and more advanced/theoretical concepts and use more tailored datasets for education. We also want this product to be used at universities like UT Austin and others, at corporate meetings, at elementary schools, and anywhere where effective and fast note taking is needed.
Built With
- css
- firebase
- flask
- google-speech-to-text-api
- html
- javascript
- machine-learning
- natural-language-processing
- python
- react-js
Log in or sign up for Devpost to join the conversation.