Everyone in the world craves time for one simple reason: time is one of the few things we can’t get back. After it’s spent, it’s gone.
Our project, reBlock, is built with this idea in mind. Using cutting-edge machine learning and natural language processing technologies, reBlock automatically blocks YouTube sponsors, saving you precious time in real time.
⏭ What it does
reBlock is a web application that uses deep learning to block sponsor segments in YouTube videos. Whereas traditional ad blockers (e.g. uBlock Origin) only skip ads that YouTube puts in videos, reBlock detects sponsored segments in real time and skips them.
🛠️ How we built it
When the user enters a URL, they interact with our front-end built with React and NextJS. This sends a request to our backend server (FastAPI), which scrapes the transcript from the YouTube video. Then, the server passes the transcript through a large transformer model that we trained.
In order to successfully detect sponsors, we fine-tuned RoBERTa-base, a state-of-the-art neural network with 125 million parameters on data that we scraped on YouTube and Sponsorblock (a database of sponsored segment timestamps in videos). We trained the network on over 31 000 youtube videos.
🚧 Challenges we ran into
One major challenge was training the model given the time constraints–we ended up using four GeForce GTX 1080 Ti’s to fine-tune RoBERTa-base, over a period of 7 hours.
😊 Accomplishments that we're proud of
Given the short time constraints, we are extremely proud of creating a deep learning powered tool to detect sponsors in real-time! From a more technical standpoint, we are proud of fine-tuning the RoBERTa language model on YouTube sponsor data we scraped, and are proud of adding a user-friendly web-interface.
🧐 What we learned
- How to prepare a token-level classification dataset & process captioning data
- How to interact with the YouTube player in a web app
- How to use ngrok to expose the backend (and ML model) that we host ourselves
⏩ What's next for reBlock
In the future we plan on adding:
- Turn reBlock into an Chrome extension
- Support for other video platforms (e.g. Vimeo)