Inspiration

Our inspiration for The Clip Curator came from the increasing prominence of AI-generated content and the need to filter out the best quality text-video pairs efficiently.

What it does

The Clip Curator automates the selection of top-tier text-video pairs. It downloads videos from YouTube, cuts them using ffmpeg, samples frames with OpenCV, and feeds them through our CLIP zero-shot scorer model (ViT-B/32). The output is a JSON file containing the best-ranking pairs based on the score generated by the model.

How we built it

We built The Clip Curator using Python, leveraging libraries such as ffmpeg, OpenCV, and PyTorch for video processing and scoring. We also utilized CLIP, an AI model developed by OpenAI, for scoring the text-video pairs.

Challenges we ran into

One of the main challenges we faced was optimizing the pipeline for efficiency, given the resource-intensive nature of video processing and scoring. Additionally, integrating various components of the pipeline seamlessly posed technical challenges

Accomplishments that we're proud of

We're proud to have developed a fully functional pipeline that automates the selection of top-quality text-video pairs. We successfully integrated multiple tools and technologies to achieve our goal.

What we learned

Through building The Clip Curator, we gained valuable experience in video processing, AI scoring, and pipeline optimization. We also learned how to effectively leverage existing models and libraries to solve real-world problems.

What's next for The Clip Curator

In the future, we plan to further optimize the pipeline and explore additional features such as cloud hosting and specialized training of the scoring model using larger datasets. We also aim to enhance the user experience and scalability of the platform.

Built With

Share this project:

Updates