Most students know the feeling of being behind in classes, and Stanford really tries to help us out. We get both lecture slides and videos for many classes, but neither alone is sufficient. Lecture slides are useful for quick information look up, but are often too dense to interpret without explanation. On the other hand, videos fill in these holes in understanding, but are riddled with superfluous information that can take hours to parse through. What if we could combine these two resources to create a fully integrated visual and auditory learning experience?
What it does
Slip establishes a two-way mapping between class videos and slides to allow for a seamless transition between the two. Watch a few slides until the material gets too dense and then click the slide to instantly move to the exact point in the video where that same concept is being explained. By fully integrating classroom resources, Slip allows students to navigate between class notes, slides, and videos with a single click.
How we built it
We collected our source data by extracting all the slides from lecture notes with ImageMagick and key frames from the class video using ffmpeg. After extraction, we use SIFT to identify the slide, if present, in every frame, and OCR (optical character recognition) to see how closely the text in each slide/frame match up. By combining these two metrics, we can compute optimal slide and frame mappings for the entire lecture with 90-95% confidence.
Challenges we ran into
Accuracy is extremely important, but often videos don’t have great captures of the slides. Neither image processing nor OCR alone were enough to reach an accuracy we liked, but they compliment each other very well. OCR is very good for text-heavy slides and image processing is very effective on others. Even using both together, the algorithm still found incorrect mappings much of the time. The big trick for great accuracy was using the knowledge that we have the slides in order. This allows us to not simply look for the best frame for each slide, but the best set of frames for all the slides at once, such that the slides are in order. Optimizing this in a reasonable amount of time required a clever dynamic programming solution, but greatly increased accuracy.
Accomplishments that we're proud of
Definitely accuracy. We came in with very little knowledge of image processing and ended up getting some really good accuracy. We also built a seamless front end that makes it super simple for the user to switch between video and lecture, maximizing productivity.
What we learned
If it can go wrong, it will go wrong. We definitely had issues along the way from libraries with bad documentation, to hard to fun bugs, to being completely unsure of how to proceed, but together we powered through.
What's next for Slip
Improve speed of algorithm for image processing.