--> demo video password: LAHACKS <--

The demo video (above) and demo song (below) are each ~1 minute, which totals to ~2 minutes of demo. Enjoy!

-- // -- // -- // -- // --


AI voices spittin' AI bars over AI beats.

I like to write small songs when I attend hackathons. At LA Hacks this year, I challenged myself to incorporate songwriting into my project.

I wanted to see if I could create an original rap track using artificial intelligence. Here's how I did it.


After spending hours researching different kinds of convolutional neural networks, I stumbled across an open-source project called DeepFire - which used Keras to encapsulate much of the ML heavy-lifting that goes into text analysis and synthesis. This program is capable of generating fresh lyrics word-by-word using Markov a chain-based prediction model, after "learning" the corpus (raw text of any given rapper's lyrics). I had to fix some old dependencies and minimally edit the scripts so the program could run via the CLI. When my terminal finally spit out some ~100 lines of rap at 5 am, I knew I was ready for the next phase.

sources: Stanford DeepFire Paper, DeepFire Repo

song structure

DeepFire used a Long Short-Term Memory (LSTM) model to "learn" song structure from the corpus. Though DeepFire's deep learning model had a basic understanding of rhyme scheme, the unspoken English pronunciation rules caused many of the generated lines to have imperfect rhymes. Without editing the content of the lines generated by the program, I rearranged some lines to align their end rhyme. (Is it just me, or did that sound like it'd fit right into a rap?)


I started excitedly reading about Restricted Boltzmann machines and their applications in algorithmic music composition. Due to the constraints of time and computing power, I reasoned that if I wanted to achieve my MVP, I needed to find an existing technology to build off of (maybe RBM composition will be my next hack!). Enter CodeParade's NeuralComposer - an open-source tool that uses variational autoencoders (VAE) and principal component analysis to generate original music in real time. The tool was trained on over 4,000 chiptune video game tracks for a minimum of 2,000 epochs. There is a graphical interface that allows the user to control each of the 40 (unspecified) principal components with sliders - which modifies an original 16-bar piece as it is composed before the user's eyes! I adjusted the sliders until the program yielded a somewhat catchy melody. I was elated but also daunted: now I had to find a way to combine the lyrics with the music.

source: NeuralComposer Repo

TTS rapping + assembly

At this point, I made a design choice: I wanted to use a text-to-speech service to rap the lyrics that the computer wrote. After removing the swears and struggling with the monotonous TTSReader voices, I found Google Cloud's text-to-speech API refreshingly simple to use. In particular, the WaveNet voices felt much more authentic - an important discovery for the final demo. I fed the AI-generated rap lyrics into each text-to-speech application, recorded the raw audio, and put everything into a GarageBand project so I could add a drum preset and sync up the rhythm and words. Interestingly enough, this was the most time-consuming and frustrating phase of my entire project. Eventually, it worked. I exported the final song directly to SoundCloud - and that's a "wrap" ;)

Check out "Pragma Once (demo)" on SoundCloud!

I learned so much from this solo project - practical applications of AI for music, effective time management, how to use GarageBand, and so much more. LA Hacks was my favorite last year, and even virtually, y'all did not disappoint :^)

(All the code and other assets live on the Github repo!)

Thanks for your time :~)

coming soon:

  • more ambient AI-powered tracks - stay tuned!

  • bundle all these tools into one application?

  • automate the rhythm/vocal syncing process... so no one ever has to do that by hand again :'(

Built With

Share this project: