As university students, we all know that you can never make it to all your lectures, despite your best efforts - and even when you’re lucky enough to have your professor upload their notes online, it’s just the outline of what was covered in class. We wanted to make something to intelligently generate a set of notes during the lecture, documenting what the professor said and drew/wrote on the board, which one could use to study from.
What it Does
The program converts the lecture to text, focusing on the most important terms and adding links to Wikipedia for very significant concepts. It also records snapshots of the board, providing a reference to various diagrams, as well as a back-up to double check that the conversion from speech to text was correct.
How we built it
Our program converts speech to text using the Google Cloud Speech API, and processes the text using Google Natural Language Processing API. This analysis picks up on the most relevant terms in the lecture, as well as annotating the notes with external references to Wikipedia links if the concept is especially important. At the same time, the camera records a video of the lecture, and we save the image every few frames. This image is processed, first for facial recognition using the Microsoft Azure Cognitive Services' Computer Vision API and Facial Recognition API. Finally, the snapshots and text are integrated in a generated OneNote file. We used Python to do all of the above, and the interface was created using Node.JS.
Challenges we ran into
The majority of our team has never worked with APIs prior to this hackathon, and figuring that out in general, as well for the specific APIs we used was quite the challenge. We also found it difficult to figure out authentication for the Microsoft APIs. In regards to speech recognition and processing, it was challenging at first to understand the specific formats it will allow, specifically the encoding format as well as the one minute time limit per request. It also lacked inbuilt punctuation support, which was difficult to work around.
Accomplishments we’re proud of
We got a working program that does what we wanted it to do and more :^)
What's next for Noter
There are a couple of ways we could move forward with this project. One would be to work on the user interface more, making into a full-fledged website with a well-designed layout. This was something we considered doing early on, but we chose to focus on getting a functional program first and foremost. Secondly, we can improve the functionality. For instance, it would be more cohesive if diagrams on the board could be converted into diagrams in OneNote using shape detection of some sort. Another instance would be improving when and how each snapshot of the board is taken. The Microsoft Kinect Motion Sensor can be used to more accurately detect the professor's movements, and more advanced machine vision algorithms could be used to more closely analyze when the best time to take a picture would be. Those algorithms can also be made to take only a picture of the specific areas of the board where changes have been made. Lastly, we can provide tailored tools and support to generate notes for PowerPoint presentations.