With the recent advances in artificial intelligence, it seems like computers with the right AI can do anything. Sometimes it knows humans even better than we do. In that case, can we make a computer rap? Hip-hop as a genre is filled with human emotion and art, but even so, there exist many patterns within the fluidly spoken words. With a recurrent neural net, we might be able to detect those patterns make a machine learning model that not only speaks English, but raps like a rapper.
What it does
The project generates rap in a style similar to the rapper it was trained on. In our case, we chose Eminem as our training data. After generating lyrics, the project can then create a procedurally generated random beat and synthesize a crude voice demo of the rap.
How we built it
We built the project using a variety of tools, the most important one being a recurrent neural network. The recurrent neural network is a special neural network in that its output is affected by what was previously input to it. The neural net that we used takes in ascii characters and outputs ascii characters that it thinks should be in the rap. The neural net had a total of 2 hidden layers with 512 nodes in each layer.
In order to train the model, we also had to gather training data. We did this by scraping all Eminem lyrics from 2011 and before from this site: http://www.ohhla.com/all.html. The data was downloaded with wget and parsed using Mathematica.
The neural network was implemented using Lua's Torch machine learning library, while the beat generation, data crawling, and speech synthesis were all done in Mathematica.
Challenges we ran into
We ran into many challenges:
- Installing Torch on our computers took forever and there were many dependency errors.
- Cleaning the data that was scraped from the website was difficult, in that we could never be sure we cleaned out everything, especially since we couldn't manually scan 10000+ lines character by character.
- Figuring out feasible hyperparameters for the neural net was also a difficult problem. Too large, and the training would not finish even in 24 hours. Too small, and the resulting neural net would not be able to learn properly.
Accomplishments that we're proud of
We are proud of:
- Creating a model that can produce some coherent English words, even though it does not have access to any sort of English database.
- Almost being able to generate coherent rap, although the lines do not quite rhyme.
- Creating a way to demonstrate the rap by procedurally generating beats and speech synthesizing the lyrics.
What we learned
- RNNs are really powerful
- Rhyme is a really subtle pattern to learn in English
- If you train your model on any explicit rapper, the first word it will likely learn is f*ck
What's next for RaaSBerry
Originally, the plan was to have the program run on a Raspberry Pi. However, we could not get our board to work, we were not able to implement that part of the project. Future steps for the project might include better tuning of the model parameters for optimal performance (as well as seeing if it can do any better), training on better hardware (such as actual distributed GPUs), collecting more input data, and uploading the resulting models to a Raspberry Pi.