The Gist

We will be training a generative adversarial network to produce classical music. We will be using transcriptions of a database of classical music to create music, compare against test (real) works with a discriminator, training the generator and the discriminator to better produce music and distinguish between generated and non-generated music, respectively. Note: this was our initial plan and eventually we pivoted to GRUs instead.

Our Team

dlauerma - David Lauerman
mburke15 - Mason Burke
ssungun - Serdar Sungun

Introduction

The creation of music is not so much a problem to be solved as much as it is a creative expedition. At the end of the day, having a machine produce any quality of music is a success for this project. Our reference is a paper from Stanford, and they used an end-to-end learning model to generate classical music. We’re not experienced with end-to-end learning, as it involves significantly more difficult and complicated mathematical formulation than the traditional neural networks we’ve learned about and implemented. Instead, we will be using a generative adversarial network to achieve the same goal, which will hopefully produce comparable results to those observed in the paper. The three of us are all quite musically inclined, with a particular interest in piano and guitar music.

Related Work

Generating Music with a GAN

I found this great article about an attempted implementation of a GAN to generate monophonic music, which includes some useful insight about the use of GANs in their experience. Essentially, they first tried using an n-gram, with little success, but then moved on to a SeqGAN to much better returns. By their measures, they in the end failed to produce real music, although there were prevalent spots of ordered semi-musicality. Apparently, they see great potential in a more finely tuned implementation with more layers, specifically recommending trying various filter sizes to optimize results and cleaning the dataset so that it contains only songs of the same tempo and time signature.

Data

MusicNet Dataset

This dataset is decently large, and it will likely not need much preprocessing. If we can figure out a way to do it, we may try and transpose everything to the same key to simplify the training process.

Methodology

As mentioned previously, we’ll be using a generative adversarial network to build a generator for classical music. We will be training our model with the MusicNet data, a database of 330 classical works. These works have been written into a csv file, with each note having information about pitch, duration, and other important characteristics, such as information about the instrument used. This information is encoded chronologically and preprocessed in such a way as to be ready to be inputs into a neural network. We will train using alternating epochs of generator training and discriminator training, in order to mitigate the equilibrium-style game that continually trained generators and discriminators face when pitted against each other. The paper we reference does not use a GAN, but we are hoping that the generational capabilities of GANs (combined with the inscrutability of its internal representations) will appeal well to a subjective-evaluative model.

Metrics

It is difficult to evaluate a music-creating model from a quantitative or classification-based standpoint, since the value of a piece of classical music is inherently subjective. So, we propose a system of evaluation based on several observational and subjective criteria. Is there a coherent melody? Are there motifs used throughout the work that reinforce a melody? Is there an underlying chord structure that complements the melody, and does it follow an intelligible pattern? Is there a clear use of harmony? Does the piece have development? Is there a clear beginning, middle, and end? Is the music pleasant to listen to? Does the generated sample diverge well enough from the training dataset? For our model to be considered “successful,” it should meet all of these criteria. Most importantly, the music should be nice to listen to.

Our base goal is to make music that is sensical, something that bears some semblance to classical music. The target goal is to make something that is listenable, preferably with some identifiable attention to structure. In the best-case scenario, we would aim to make music that is really pleasant to listen to. Additionally, we could conceivably compare our reactions to our generated music to other classical music generated by others in a similar fashion.

Ethics

What broader societal issues are relevant to your chosen problem space? Music and art pieces are productions of creativity and inspiration. This creates a thin line between just getting inspired and not producing an original work. The possible ethical problem here is to market some other musician’s work as your own after making minor changes on it. How our project may be related to this problem is that, at the end of the day we will be training our model with other tunes to generate new tunes and there most probably be similarities. I don’t think we should be worrying about this too much as a lot of music shares chord progressions, note sequences, or melodic harmonies, and as we’ll be collecting information from many pieces, the generated tunes will have their own touch. Why is Deep Learning a good approach to this problem? We will be specifically focusing on classical music. We will be using 330 classical music pieces that we have access to the data of, as well as tens of underlying metrics such as musical patterns, melody completing chord progressions, the notes etc. This gives us hundreds of thousands of data points. Considering the size of our data, deep learning is a good method to employ in this project.

Division of Labor

We are suitemates and we would mainly be sharing the work equally, if not working on every step together.

Built With

Updates

Mason Burke posted an update — Dec 09, 2021 09:13 PM EST

ZimmeRNN

Mason Burke, David Lauerman, Serdar Sungun

Introduction

The goal of our project was to generate classical music given a database of classical music transcriptions. This wasn’t so much oriented towards solving a problem as much as it was a creative pursuit.

Methodology

We first wanted to use a GAN to accomplish this task, which ultimately may have worked, but we opted for an RNN due to the simplicity of the model. Much like the MNIST datasets of images we used in class assignments, there exists a MusicNet library available online to use. The MusicNet library contains 330 song files from Baroque/Romantic period composers, primarily Bach and Chopin. While useful, these files contained a wealth of information that required a lot of paring down for our project. At each point in time, the song would have a list of notes that were playing, along with all of its pertinent information, like duration, instrument, volume, etc. We decided that the best way to process this data for use with a deep neural network would be to extract all of the note values, and so at each sampled timestep we would have a multi-one-hot vector of all of the notes playing at that moment. So, we would take windows of input samples like this, and then predict the next timestep given that sequence of notes. After training, we took our fully trained RNN, and passed in a sequence of notes from somewhere in one of our input songs in array form to act as a seed for the RNN generation. Then, we had the RNN predict timesteps for a total of a minute’s worth of music. The result was a functional multi-one-hot vector for each timestep.

Results

So, we have our output array of notes at each timestep. What now? Our first attempt was to iterate through these arrays to add each note individually to a MIDI file using a library, and then use another library to actually play the output MIDI file. At some point in this chain of events, some major miscommunication occurred, and when we played the resulting MIDI file, an audio blip lasting about 0.1 second would play. We later found out that this was an issue with the MIDI playing library, and also learned that our MIDI was being created correctly. Unfortunately, when we plotted the loss for each batch, we did not get good results. The model wasn’t really learning, and we didn’t have an effective loss function.

Our best guess is that there’s a programming issue somewhere in our code, as well as not finding a ready-to-use loss function for the way that we structured our input data.

Challenges

The difficulty in this project was very concentrated in the preprocessing of our input data, and in the postprocessing of our output song. All of the preprocessing we did took the majority of the time of the project and required the most attention. It was also quite difficult to get the MIDI output to be functional, and we ended up abandoning this approach due to its failure. Overall, this project would have been much simpler if we had found and used a dataset that was directed towards how we would end up implementing our project. This would have allowed us to have had a better chance of picking an output style that would be useful in creating a MIDI or whatever this new input format was.

Reflection

Our main takeaway from this project is that music generation and processing is inherently a very difficult process, filled with many necessary sacrifices. A potentially better approach to this project would be to use a song that was in the form of a tokenized MIDI file. Every time that a data structure is passed through some transformation, there’s a substantial risk of data loss, as we found with our first approach. We also had a few ideas that we hoped to implement but were not able to finish due to a lack of time. One idea was to have another network learn the number of notes to be played at a certain timestep. This was due to the fact that our output at each timestep was not strictly ones and zeros, but zeros and floats between 0 and 1. So, we had to have some way of picking which notes would be played, and how many. We settled on simply picking the top 3 notes at a timestep to be played (if there were fewer than 3, then just as many as there were results for). By having our network work in tandem with a network that predicts the number of notes to select at each timestep, we would be able to more faithfully output notes, rather than a predetermined fixed number.

Another potential spot of improvement for our model would be to learn separate instruments, or perhaps focus on monophonic/single instrument music. Different instruments have different roles, sticking all instruments in the training set into a single instrument output may muddy the results of learned note structure.

On the whole, our project was a success in terms of data processing, but our ability to produce an audio output file was limited by the difficulty of interfacing with available libraries.

This brings us to our biggest takeaway from this project, which is the value of in-depth planning. We took on a hard project, so we needed to pick a route and just go for it to be able to get it done in time. However, had we had more time, we could have devoted more time to planning ahead, so that deeper problems could be uncovered, and key pivots could have been made. That being said, we are moderately happy with how it turned out, and we gained a lot of familiarity with RNNs and with data handling in Python.

Log in or sign up for Devpost to join the conversation.

Mason Burke posted an update — Nov 30, 2021 11:14 PM EST

Introduction

The creation of music is not so much a problem to be solved so much as it is a creative expedition. At the end of the day, having a machine produce any quality of music is a success for this project. Our reference is a paper from Stanford, and they used an end-to-end learning model to generate classical music. We’re not experienced with end-to-end learning, as it involves significantly more difficult and complicated mathematical formulation than the traditional neural networks we’ve learned about and implemented. Instead, we will be using a generative adversarial network to achieve the same goal, which will hopefully produce comparable results to those observed in the paper. The three of us are all quite musically inclined, with a particular interest in piano and guitar music.

Challenges

What has been the hardest part of the project you’ve encountered so far? The data we are using is a NumPy distribution of MusicNet. It was in npz format and it took us a little time to figure out how to extract the data that we needed out of it. Additionally, as we were working on our data, and contemplating how to utilize it for our purposes of generating our own tunes, we realized that our previously planned GAN model wouldn’t be the best fit. Instead, we have decided to employ the LSTM-based RNN model.

Insights

Are there any concrete results you can show at this point?

We are currently saving each song to its own .npz file for easy loading in the future. This will take a long time, but we don’t need too many song files to properly train our model. Additionally, by this token, we have some music files that are one-hot vectors available to be viewed.

How is your model performing compared with expectations?

Our model is not yet being trained, but we are not concerned.

Plan

Are you on track with your project?

Even though we haven’t finalized our model, we have agreed on the details of our model. We will be employing an RNN model based on LSTM. We may not be on track with the entirety of the expectations for this deliverable, but we can see our roadmap until the due date and feel quite confident about the further steps. As a result, we are happy with the process we’ve done.

What do you need to dedicate more time to?

I think we need to dedicate more time to building our model, and testing its efficiency, then we can work on our methods to improve our efficiency.

What are you thinking of changing, if anything?

We have already decided to switch from a GAN to an LSTM based approach.

Log in or sign up for Devpost to join the conversation.

Mason Burke started this project — Nov 12, 2021 09:36 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.