Inspiration
I was listening to some 70s and 80s country music the other day when it occurred to me that we were never going to get those days back, so perhaps we'd never get music like that back. I was listening to American Pie, by Don McLean, which had the lines:
"A long, long time ago...
I can still remember
How that music used to make my smile
And I knew if I had my chance
That I could make those people dance
And, maybe, they'd be happy for a while"
The song goes on to talk about the day that music died, where a plane crash ended the lives of McLean's musical heroes all at once.
Perhaps like McLean, the genre of music that I like is hopelessly stuck in the past, its icons musically dead.
Or not. Deep learning is unbelievably powerful. And one day the machine can take up the mantle of the musician .
What it does
Deep Player is at it's core a very simple music player. It has a library of songs. A user can select which song to play by it's index, and add their own songs via the update button.
However, the power of Deep Player is the recurrent neural network that crawls through the library. As you add music to the library, the recurrent neural network begins to understand what type of music you enjoy enough to add to the library. With enough training iterations, the network becomes able to generate music in the style of the music you add to the library. This is accessed through the synthesis button, which plays original, unique, music generated by the neural network from scratch.
What's even cooler is that the network doesn't start out with the concept of musical notation. It learns to output valid musical notation as it trains!
How we built it
We began by setting up a simple Char-RNN in keras. Everything else was built around the Char-RNN and designed to interface with it.
Training the neural network was a different story. Since neural networks are really slow to train on CPUs, training was done on an Amazon Web Service EC2 p2.xlarge instance with 1 GPU. This let us get the model to a reasonable level of accuracy.
Challenges we ran into
Training the RNN was hard, as that required learning how to interface with remove servers, which as entirely new to me.
Another difficult challenge was figuring out the format of the data to feed into the neural net. We settled on ABC notation because it broke music down into a 1D timeseries.
This created the problem of translating music between different formats. ABC notation isn't directly playable so we had to convert it into midi using music21.
Accomplishments that we're proud of
Creating a working neural network is an accomplishment that we are proud of. Often neural nets made by newbies are plagued with issues such as exploding/vanishing gradient, non-converging loss, mode collapse, etc. I'm just proud to have made something that works.
Another accomplishment that we are proud of was figuring out how to translate music between ABC notation and midi. This was a challenge for us at first but once we solved it, using/playing the music we generated with the neural net was much simpler.
What we learned
Deep learning is really powerful but also really hard. The model we trained is unreasonably effective. The fact that it was able to learn valid ABC notation from the training data alone blows my mind.
However, its also super easy to screw everything up. There are so many choices that need to be made off of mostly intuition (i.e. one-hot or embeddings, num hidden layers/nodes, etc). It's important to maintain a strong theoretical basis when diving into deep learning, which let's us solve non-bug problems (code throws no errors, but the network is awful).
What's next for Deep Player
Honestly at this stage we mainly just have to focus on continuing to train it. As long as it trains, the better it gets at folk music. However in the long, long term, you can expect a shift from Long Short Term Memory layers to Causal Convolutions as the model grows more complex. Recurrent layers are simply more difficult to train, and when dealing with huge tensors for your input and output (such as when using frequency data instead of ABC notation), they can't train in a reasonable amount of time.
Those more involved with deep learning will recognize that this is a shift towards Google's WaveNet implementation as the model grows more complex and solves more complex problems (generating human speech).
Log in or sign up for Devpost to join the conversation.