Inspiration
A few weeks ago I thought it would be cool to splice together short clips of movies and TV shows together to kind of "spell out" one of my favorite songs of all time, Self Control by Frank Ocean. I thought it would be a medium that no one had really done extensively to this point. The project reminded me of writing I had seen before where people cut words and letters out of magazines, which I discovered was called ransom-note-style writing. So naturally, the name had to do something with it, hence Ransomify.
What it does
Ransomify takes a string as input and gives a video as output. Ransomify breaks the input into blocks of up to 4 words and searches for scenes from movies and TV with the phrase. It then saves those clips and splices them all together to get your message back to you.
How I built it
Ransomify was built in Python with a number of libraries and APIs. The most important are Selenium (a web driver, used to gather movie clips), Whisper (OpenAI's Automatic Speech Recognition (ASR) neural net, used to transcribe scenes), and ffmpeg (through the moviepy library).
Challenges I ran into
A big challenge was transcribing the movie scenes I was downloading. I had heard about Whisper from my friend who was working on a video editing-related project of his own months ago, and he had high praise for its .srt file generation. However, while Whisper is great for longer files, the clips I was dealing with were short and the captions required high precision to splice specific words and phrases. My research led me to another Python package named "whisper-timestamped", which builds upon whisper to accurately predict word-level timestamps using a technique called "Dynamic Time Warping". My woes didn't end there though. As good as Whisper is, it still makes mistakes in difficult situations, which would lead to errors and crashes. I eventually came up with the solution of gathering multiple video links containing the same phrase. If one clip is too difficult for the network to decipher, it moves on to the next one. When all else fails or a clip is not found containing a phrase, a new search is done on the phrase - the last word.
Accomplishments that I'm proud of
Achieving something that somewhat consistently produces something listenable.
What we learned
Strings that I strengthened this weekend include web scraping/crawling (I originally wanted to use the beautifulsoup library but the mp4 files I wanted were not static), problem-solving, Python, and file manipulation.
What's next for Ransomify
First of all, I want to improve the performance of Ransomify. As it is right now, it takes upwards of 30 minutes to generate a video that is only a few seconds long. Next, I would make it more robust because it is vulnerable to crashes when it encounters unexpected things. I'd also love to make it prettier, maybe even make it into a web app as some kind of novel thing people find and have fun with.
Note
The black screens are intentional. They serve to space apart the clips a little to make it easier to understand.
Log in or sign up for Devpost to join the conversation.