Inspiration
Modern university lecture halls are almost universally equipped with recording equipment to capture lectures, a blessing to students who would not otherwise be able to attend class. This being said, lecture videos are often time-consuming to watch and are not searchable for particular content, especially when the instructor does not publish supplementary lecture notes and writes entirely in chalk without using lecture slides. As well, while visually-impaired students can listen to the audio of the lecture, the visual contents of the board are still inaccessible to them.
What it does
Presentr is a system that takes either a pre-recorded lecture video (or can be adapted to livestreams of lectures) and uses OCR (optical character recognition) to perform handwriting analysis and text identification of chalkboard notes. The on-screen text is parsed for spelling errors and then assembled into a transcript of timestamped notes. The transcript can then be output in the form of a polished, searchable Beamer presentation PDF (along with source code to make edits), or used to generate a live transcript of board notes that accompanies the video. By translating the live transcript into lines of Braille that update in sync (subject to some small delay) with the audio, visually impaired students can use a refreshable braille display device to follow along with the text of board notes live as those notes are produced, or later in sync with the audio of the presentation.
How we built it
Optical character recognition, especially for handwriting, is an especially challenging task. We used the Google Vision Cloud API to extract text from frames. From there, we observed that the instructor pacing back and forth across the chalkboard necessarily blocked the camera, so we used a spell checker to isolate sentences with high frequency of correctly spelled words and used fuzzy matching to pair sentence fragments with the full sentences in earlier frames that were captured fully later on camera. Finally, we used pdfLaTeX to produce the Beamer output, and Flask to produce the live display. The Braille Translation prototype is based on LazoCoder's text to Grade-2 Braille translator, freely available under GPL v3.0.
Challenges we ran into
Our initial goal was stitching together text from a large body of sentence fragments to create a single chronological transcript of the entire lecture. It turns out stitching text in this manner is difficult, so we settled on capturing a series of snapshots captured by the camera at any given time.
Accomplishments that we're proud of
We successfully created a system that (assuming the lecturer has reasonably neat handwriting!) takes in a lecture video and provides four output files for download by the user: a Beamer PDF, a LaTeX source, a plain text file, and a Braille text file (this, understandably, is just a prototype for what would be sent to a refreshable Braille display).
What we learned
We learned that computer vision API's have advanced considerably and are extremely accessible to use; we were surprised by their accuracy, even on relatively messy handwriting.
What's next for Presentr
Presentr has the potential to make gigabytes of archived lecture videos easily searchable, will help instructors create lecture notes on the fly simply by writing on the chalkboard instead of preparing a separate set of notes ahead of time, will help students more easily find the content they need, and can potentially provide a very useful handicap aid to help visually-impaired students learn more effectively.
Built With
- flask
- fuzzy
- google-cloud
Log in or sign up for Devpost to join the conversation.