We're passionate about AI, especially large models, as they continually astound us with what we thought they could never do. In this hackathon, we wanted to challenge ourselves to really push the boundaries of what can be done in the ML and deep learning space and to build something that would not only have potential implications for real-world use cases but also to show that ML can be of use to anyone these days, not only data scientists.
What it does
We’ve built a web app that will generate a fully-featured music video including background music, lyrics, speech, and video, all procedurally generated through the influence of a given title. We’ve also included the ability to try it yourself and to watch previously made music videos that others have queued up as well!
How we built it
We begin by using OpenAI’s GPT-3 which takes in a title and generates lyrics to a full-length rap song. We then feed these lyrics to a text-to-speech generator, Uberduck, to synthesize realistic rap audio. We perform sentiment analysis on the lyrics to determine the choice of rapper and type of background beat to best fit the lyrics and with the selected background beat, we run OpenAI’s Jukebox to extend and improvise on it. We then use CLIP to Music Video to generate a video that quite literally visualizes the lyrics by generating a visual representation of every verse in the audio. So for example, if the lyrics say “dog” then the music video will show dog-like images. And the video will then interpolate between images, producing an uncanny psychedelic experience.
Challenges we ran into
As a team of two, and many large, difficult models to set up, this was a big challenge to pull off. A majority of our time was spent on tuning the various models to work harmoniously. Designing the right prompts to coerce GPT-3 to robustly generate rap lyrics (and not just repeat itself over and over) was a harder feat than expected, and quite different from traditional programming. Jukebox also gave us troubles as it proved especially difficult to generate good music with, and required a lot of experimentation. One particular challenge was that neither of us have extensive skills in front-end development and most of it was done on the fly during this hackathon!
Accomplishments that we're proud of
We are extremely proud to have built a pipeline that has so many moving parts and yet is fully functional. We are also very proud of ourselves for being able to figure out how to work around the stack and to build a functional web app to wrap our project in that takes as little as a single word and extrapolates it into a fully fledged psychedelic music video in as little as 10 minutes.
What we learned
We learned a lot about what is possible with the current state-of-the-art AI models, as well as how to set them up. We also learned a lot about front-end and about the entire stack in general but also realize we have a long way to go there. However, this experience really made us believe that given the perseverance and dedication, everyone can build something that they’re proud of.
What's next for RapBox
Due to time constraints and a shortage of teammates, there were many ideas we wanted to pursue but could not. If not for lack of experience on frontend development, we would have liked to have a more unified frontend experience. show how the lyrics were generated, as well as display them in a more organized way. We would like to expand this project into a platform for users all over to interact with and contribute to, similar to Uberduck.ai, but doing so will require us to learn a lot more about scalable cloud infrastructure.