Scribe is an accessibility-focused application that is meant to help students, especially deaf and hard-of-hearing students or ESL students, to obtain text transcriptions of lecture recordings.
A student or professor can easily upload a locally downloaded mp4 file, or provide a web link to a video hosted online by YouTube or the user's school. If video captions are available as a part of the video uploaded, those will be used. Otherwise, a video transcription will be generated using IBM's Watson AI. A PDF will be produced that includes transcribed text interlaced with screenshots of the uploaded video at the user selected time intervals The user can then download, print, or e-mail the resulting PDF.
Optional features include language translation to a language of the user's choosing, and user-editing of the resulting transcription to ensure the best accuracy possible in the final transcription PDF.
Scribe also has many other possible uses for thoroughly documenting any type of lecture or presentation.
The inspiration for Scribe came from reviewing notes from a professor who said many important things during lectures, and who shared the PowerPoint slides with the class, but the slides lacked the detail of the spoken portion. In addition to being potentially useful for those with accessibility concerns, we thought something like Scribe could be used by anyone to search for key-words in long videos to easily zero in on desired topics during review. The Scribe PDF could be reviewed when watching the video is not an option, or the notes could be used as a way of navigating a lecture video much like the index of a book.
🌎View on the web:
Scribe is hosted by Heroku.
💾 View the source code:
🔨 Scribe is built with:
- Python3 w/Flask, back end
- jQuery, front end
- Bootstrap, front end
- Heroku, hosting
- IBM Watson, AI based voice to text
- MoviePy, video and audio editing
- SendGrid, email
- googletrans, text to text language translation
- FPDF, PDF document generation
- Bitstream Cyberbit, foreign character support
- MDB, dynamic refresh brokering
Scribe was created for the Summer 2020 BeaverHacks Hackathon
Challenges we ran into
- AI transcriptions are not as accurate as we hoped, and finding ends of sentences or ends of slides requires a large amount of heuristics.
- Supporting non-latin characters in pdf documents was not straight forward and required custom fonts or switching fonts and even character encoding schemes.
- Updating the processing page during file processing was not straight forward due to the timeout of standard http requests; a database broker was necessary to maintain updates.
- Heroku's request timeout is set at 30s, so although we were able too run the app run locally, we had to find a way to process the videos in the background
Accomplishments that we're proud of
- The entire project should be very helpful for people wanting to review class lectures and find topics in videos during studying.
- The feature to support languages besides English was particularly exciting to implement, especially with the diverse student body of the university.
- Utilizing both AI transcription and translation services as well as all the other libraries used provided a shocking amount of power to a relatively simple weekend-built application.
- Learning together and working as a team provided a large amount of growth opportunity, especially finding ways to put everyone's unique skills to use in elegant and efficient ways.
What we learned
- When it comes to putting together a weekend software project, python is amazing.
- Besides the web interface, every other component was something we have not used before. If anything, we learned that taking advantage of APIs and libraries is not as difficult as it might seem on the surface.
What's next for Scribe
- Custom speech models trained on domain jargon.
- Retrieve existing captions from Kaltura videos
- Text detection in images, so that snapshots are also searchable.
- Punctuation/sentence detection for transcriptions.
- Scene change detection for automatic next-slide.
- User submitted slide pdfs (for appending transcription) and/or caption files.
- Additional languages.