KEEPITREEL

Inspiration

As an artist (Noomb00m) myself I find it incredibly challenging to make music and spend extra time to make videos that help market the music on social media. For an artist this is critical to the success of getting your music to the masses to enjoy.

What it does

KEEPITREEL takes artist music as an input and and generates a video by analyzing the melodic and lyrical content of the song. The video is in 9:16 format which makes it perfect for short-form social media platforms, like TikTok or Instagram Reels.

How we built it

We used technologies like Whisper to extract lyrics and MusicCap to identify the key musical components of the song. These outputs were then fed into a Large Language Model (GPT3.5) to generate a caption for the video generation.

We curate the video content in two ways: 1) fully text-to-video AI generation using Zeroscope 2) database retrieval of relevant stock videos using Instructor Embeddings. Once the videos are generated or selected, we then stitch them into a "reel" style video which transitions between video snippets based on the transitions within the song.

The whole system was built in Python to run on CPU, except for hitting API endpoints for GPT3.5 and Zeroscope. We self-hosted Zeroscope on a Brev.dev server.

Challenges we ran into

The primary challenge we faced was running Zeroscope. Initially, we attempted to run the model as part of the pipeline on a dev. machine with a GPU. However, this proved to have a lot of latency. We decided to host the Zeroscope container on a Brev.dev machine and to interface with it through a REST API.

We also faced challenges with the prompt engineering to create aesthetically pleasing videos.

Accomplishments that we're proud of

The system works! We've generated incredible videos for several artists, such as Justin Jay (EMPIRE), Noomb00m (team member), StoneAgeTC (another hackathon participant).

We also were able to build a relatively complex system with many individual components.

What we learned

Model serving is key for large models! We also learned a lot about new video generation technologies, such as Zeroscope and Controlnet. When stitching videos together, we learned about musical structure, and how to identify transitions.

What's next for KEEPITREEL

We plan to keep building and expanding on the product. Here are some next steps:

Optimize and parallelize the system
Increase the size of the video lookup database. This could come from a bunch of video that an individual artist has already recorded!
Experiment with other video or image generation models
Use sequential information to generate a visual "story"

The key to what we're building is to continue working with artists to learn their needs, and build for them.