YouTube Query Enhancer

What does it do?

Our project was made to automatically parse the captions from YouTube videos, and output them as a plain string to be tokenized. After all the words in the one long string are tokenized, we can take a user-input query to search for content inside the captions via the tokenized content. Then, we recommend the video with the most matching tokens between the user search and the tokens from each video parsed.

How did we put it together?

We built it using Python and the YouTube API

What challenges did we run into?

Our challenges were very immense, and they kept coming. For a majority of Saturday, we had a problem with using the YouTube API and OAuth 2.0. We finally learned that, because the API has built-in delete and upload functions alongside the download, we could only download using that part of the API if we had edit permissions. Because of this, we were limited greatly.

What we learned

We learned many, many things from this experience. We learned Cosine Similarity, we learned many different resources to import into python files. We learned more use cases of Regular Expressions. These last 36 hours have been a flood of knowledge.

What's next for YouTube Query Enhancer

Another use case we believed could work would be video recommendations. Using content in the current video to compare to other videos, and show other videos by the closest related video on either a preset list or the website in a whole. This would require more calculations and lots more time.

Built With

Submitted to

sunhacks

Created by

I worked on the integration of the YouTube API for downloading the captions. This proved to be particularly difficult as the public interface for downloading the captions was paired with modification ability for the captions, and thus required you to be an owner of the video to be able to retrieve the captions, regardless of if you wanted to edit them. I eventually stumbled upon a mostly undocumented part of YouTube's API called TimedText and got around our limitations using that.

Dayton B
CraigIggy
Brett Bargay

Updates

Brett Bargay started this project — Nov 11, 2018 09:32 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.