Project Poster

CSCI 2470 Final Project

Team name: AutoMIDI

Team member: Yu Zhong and Xingjian Hao

CS login: yzhong36 and xhao9

Project Category: Music & Audio

Introduction

Musicians have struggled for years to find an automatic solution for composition. As the development of Artificial Intelligence, many applications regarding deep learning composition have emerged. Many of them used simple sequential models like RNN, but suffered from scarcity of harmonic notes. In that case, deep learning models failed to be an effective solution for music composing.

In contrast to discriminative models, generative modeling is an unsupervised learning strategy that involves learning and discovering features of input data such that it can be used to generate new samples similar to real world datasets. Generative Adversarial Networks, as known as GANs, is a deep learning based neural network that was first proposed by Ian Goodfellow in 2014, in order to estimate generative models via an adversarial process. The model is mainly composed of two components: Generator and Discriminator. Generator aims to generate new plausible datasets from the domain of interest, but Discriminator is dedicated to classify examples as real (domain of interest) or fake (example generated by model). GANs are trained in an adversarial manner until generated examples are not identified as “fake” by Discriminator any longer.

Jazz is an elegant music genre originated from Louisiana, United States. With the roots in Blues and Ragtime, Jazz characterizes itself by swing and blue notes, complex chords, and improvisation.

In this project, we aim to deploy a GAN-based architecture to perform Jazz music composing tasks. Instead of concerning traditional time-series input of music notes, we proposed to convert audios into image-based files to leverage the idea of convolutional neural network, known as Deep Convolutional Generative Adversarial Network in the field of images. Our model demonstrates a capability of reconstructing melody-based images after model training with a certain number of epochs. Then, we followed up with the post-process step to generate the bassline and drum notes, based on the rules of music theory.

Ethics

Why is Deep Learning a good approach to this problem?

Deep learning is becoming a promising strategy to perform music composing these days due to its varieties of nature structure, i.e. generative training in GAN. It is well suited for the task coming with big data. Also, a less effort on feature engineering is required before fitting into a deep learning model.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?

We collected music datasets from Kaggle website, one of the largest machine learning and data science communities. It is free to download for solving data science challenges. It seems to be less historical or societal biased since we only aim to perform jazz music composing tasks here.