VidSmash

This project was made by the ScottyLabs TartanHacks team, consisting of Scott Krulcik, Emily Newman, Bryan Yan, Ian Lo, and Ajay Jain (MIT '20) at a16z's Battle of the Hacks v3 in June 2016.

VidSmash parses a collection of YouTube videos into re-usable clips that can be smashed together into making entirely new videos. It was inspired by videos of politicians singing pop songs, which must be meticulously stitched together. VidSmash makes it easy to stitch similar videos for songs, but also other projects, such as markov-chain generated TED talks.

How It Works

Caption Retrieval

First, we pick a source for a list of YouTube videos. For our demo, we scraped the TED channel. Next, we try to download transcripts for those videos (which are not always accessible). YouTube transcripts give the start times of phrases, but not individual words.

We use Natural Language Processing (NLP) to approximate the number of syllables in each word, allowing us to estimate the speaking time of individual words given the speaking time of an entire phrase.

Clip Merging

These words are then intelligently split into re-useable clips, which try to use the most accurately timed words wherever possible. One heuristic we used to predict which words would be sampled most accurately was the position of the word in the phrase. Words in the beginning of the phrase have an exact start time, so can be clipped more precisely than words occurring later in the phrase.

Arbitrary text, song data, or markov-generated TED talks can be entered into our web application, and split into words that need to be merged. Using ffmpeg, we were able to merge our small snippets of individual words together. Additionally, we add the text of the word to the bottom of the video to aid in its interpretation.

TED Talk Generation

Markov Chains are a tool from probability theory that allow us to predict state transitions. If we consider a sequence of words as a beginning state, and the next word as the transition, we can use a large database of TED talk text to generate a TED-esque piece of text. This text can then be fed into our video generator to create an original TED video.

Share this project:
×

Updates

posted an update

VidSmash

This project was made by the ScottyLabs TartanHacks team, consisting of Scott Krulcik, Emily Newman, Bryan Yan, Ian Lo, and Ajay Jain (MIT '20) at a16z's Battle of the Hacks v3 in June 2016.

VidSmash parses a collection of YouTube videos into re-usable clips that can be smashed together into making entirely new videos. It was inspired by videos of politicians singing pop songs, which must be meticulously stitched together. VidSmash makes it easy to stitch similar videos for songs, but also other projects, such as markov-chain generated TED talks.

How It Works

Caption Retrieval

First, we pick a source for a list of YouTube videos. For our demo, we scraped the TED channel. Next, we try to download transcripts for those videos (which are not always accessible). YouTube transcripts give the start times of phrases, but not individual words.

We use Natural Language Processing (NLP) to approximate the number of syllables in each word, allowing us to estimate the speaking time of individual words given the speaking time of an entire phrase.

Clip Merging

These words are then intelligently split into re-useable clips, which try to use the most accurately timed words wherever possible. One heuristic we used to predict which words would be sampled most accurately was the position of the word in the phrase. Words in the beginning of the phrase have an exact start time, so can be clipped more precisely than words occurring later in the phrase.

Arbitrary text, song data, or markov-generated TED talks can be entered into our web application, and split into words that need to be merged. Using ffmpeg, we were able to merge our small snippets of individual words together. Additionally, we add the text of the word to the bottom of the video to aid in its interpretation.

TED Talk Generation

Markov Chains are a tool from probability theory that allow us to predict state transitions. If we consider a sequence of words as a beginning state, and the next word as the transition, we can use a large database of TED talk text to generate a TED-esque piece of text. This text can then be fed into our video generator to create an original TED video.

Log in or sign up for Devpost to join the conversation.