Great Musicians

Poster

Great Musicians: Generate Classical and Jazz Music with LSTM and Transformer

Who

Haidong Qi(hqi9), Ruigang Yao(ryao9), Yuan Fang(yfang31), Yucheng Ma(yma150)

Introduction

Classical music is the treasure of human culture. Can we generate music based on the masterpieces left by the great musicians? Music is fundamentally a sequence of notes, which is a perfect data format for deep learning. For this project, we plan to propose a music generation network capable of generating different styles of music, including baroque, romantic and jazz.

Related Work

Music is basically a sequence of notes. Thus sequence models have been the canonical choice for modeling music, from RNNs to Long Short Term Memory networks, to bidirectional LSTMs. Piano rolls are also image-like and can be modeled by CNNs trained either as generative adversarial networks or as orderless NADEs.

We will break down this project into multiple steps. We will try to employ RNNs such as GRU and LSTM to build up the network first. And then we will try to incorporate state-of-the-art Transformer as the backend sequential processor. Our work will follow this paper’s idea: link . This paper proposed a Transformer with modified relative attention mechanism can generate minute-long compositions (thousands of steps) with compelling structure.

Data

We will train our network with different music datasets, from classical piano to jazz music, to test the generalization of the model to different styles. The datasets we will use includes classical music dataset http://www.piano-midi.de/, and jazz dataset: https://bushgrafts.com/midi/

Methodology

We will break down this project into multiple steps. We will try to employ RNNs such as GRU and LSTM to build up the network first. And then we will try to incorporate state-of-the-art Transformer as the backend sequential processor. Looking back, the professor introduced sequence to sequence technique in Machine Translation, where we have an encoder gathering context information and a decoder predicting the next word. We also have attention to provide a mechanism when generating a new word, but only pay attention to some specific sequences. That sounds pretty similar to composing music. In well-composed music, notes are divided into different bars. A bar may be relevant to some specific bar before it and there is a pattern. So probably we can apply this technique to music generation.

Metrics

We will use perplexity and SparseCategoricalAccuracy as our metrics.

Ethics

Why is Deep Learning a good approach to this problem?

Music generation using deep learning techniques has been a topic of interest for the past two decades. Music proves to be a different challenge compared to images, among three main dimensions: Firstly, music is temporal, with a hierarchical structure with dependencies across time. Secondly, music consists of multiple instruments that are interdependent and unfold across time. Thirdly, music is grouped into chords, arpeggios and melodies — hence each time-step may have multiple outputs.

However, audio data has several properties that make them familiar in some ways to what is conventionally studied in deep learning (computer vision and natural language processing, or NLP). The sequential nature of the music reminds us of NLP, which we can use Recurrent Neural Networks for. There are also multiple ‘channels’ of audio (in terms of tones, and instruments), that are reminiscent of images that Convolutional Neural Networks can be used for. Additionally, deep generative models are exciting new areas of research, with the potential to create realistic synthetic data. Some examples are Variational Autoencoders (VAEs) and Generative Adversarial Neworks (GANs), as well as language models in NLP.

What broader societal issues are relevant to your chosen problem space?

Society issue: shape of cultures, history heritage, human civilizations: the power of emotion, moral, and culture. Music, as a cultural right, may aid in the promotion and protection of other human rights. It can help in the healing process, dismantling walls and boundaries, reconciliation, and education. Around the world, music is being used as a vehicle for social change and bringing communities together.

Division of labor

Haidong will take charge of dataset collection and preposse and build up an LSTM-based model for the first place. Ruigang will be in charge of building a transformer model. Yucheng will be responsible for doing quantitative ablation studies. Yuan will collect all the information together to produce the final poster and report.