RapBox

A sample of one of the generated music videos
A sample of one of the generated music videos

Inspiration

We're passionate about AI, especially large models, as they continually astound us with what we thought they could never do. In this hackathon, we wanted to challenge ourselves to really push the boundaries of what can be done in the ML and deep learning space and to build something that would not only have potential implications for real-world use cases but also to show that ML can be of use to anyone these days, not only data scientists.

What it does

We’ve built a web app that will generate a fully-featured music video including background music, lyrics, speech, and video, all procedurally generated through the influence of a given title. We’ve also included the ability to try it yourself and to watch previously made music videos that others have queued up as well!

How we built it

We begin by using OpenAI’s GPT-3 which takes in a title and generates lyrics to a full-length rap song. We then feed these lyrics to a text-to-speech generator, Uberduck, to synthesize realistic rap audio. We perform sentiment analysis on the lyrics to determine the choice of rapper and type of background beat to best fit the lyrics and with the selected background beat, we run OpenAI’s Jukebox to extend and improvise on it. We then use CLIP to Music Video to generate a video that quite literally visualizes the lyrics by generating a visual representation of every verse in the audio. So for example, if the lyrics say “dog” then the music video will show dog-like images. And the video will then interpolate between images, producing an uncanny psychedelic experience.

Challenges we ran into

As a team of two, and many large, difficult models to set up, this was a big challenge to pull off. A majority of our time was spent on tuning the various models to work harmoniously. Designing the right prompts to coerce GPT-3 to robustly generate rap lyrics (and not just repeat itself over and over) was a harder feat than expected, and quite different from traditional programming. Jukebox also gave us troubles as it proved especially difficult to generate good music with, and required a lot of experimentation. One particular challenge was that neither of us have extensive skills in front-end development and most of it was done on the fly during this hackathon!

Accomplishments that we're proud of

We are extremely proud to have built a pipeline that has so many moving parts and yet is fully functional. We are also very proud of ourselves for being able to figure out how to work around the stack and to build a functional web app to wrap our project in that takes as little as a single word and extrapolates it into a fully fledged psychedelic music video in as little as 10 minutes.

What we learned

We learned a lot about what is possible with the current state-of-the-art AI models, as well as how to set them up. We also learned a lot about front-end and about the entire stack in general but also realize we have a long way to go there. However, this experience really made us believe that given the perseverance and dedication, everyone can build something that they’re proud of.

What's next for RapBox

Due to time constraints and a shortage of teammates, there were many ideas we wanted to pursue but could not. If not for lack of experience on frontend development, we would have liked to have a more unified frontend experience. show how the lyrics were generated, as well as display them in a more organized way. We would like to expand this project into a platform for users all over to interact with and contribute to, similar to Uberduck.ai, but doing so will require us to learn a lot more about scalable cloud infrastructure.

Built With

gpt-3
python

Updates

Austin Cheng started this project — Sep 19, 2021 09:56 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.