Music Genre Classification

Inspiration

Our inspiration came from wanting to push the limitations of our knowledge and the capabilities of our coding skills. The theme was Arts and Tech and we all immediate focused on music. We wondered how well AI can learn the distinctions of music, name their genres.

What it does

The model uses Mel-frequency cepstral coefficients (MFCCs) to train a network to distinguish between 8-10 genres of music.

This program allows the user to select the depth in which they can interact with the classifier:

Users can submit an mp3 to be converted to a wav file and evaluated by our trained model.
Users can download a dataset (GTZAN or FMA) to prepare their own data, with flexible learning rates, dropout percentages, and audio transformations.
Users can use the mp3 to wav conversion on copyright free music to populate their own genre list.

How we built it

We used Librosa to analyze songs and convert them to MFCCs, which are matrices that a neural net can read. We used Pytorch to create the neural net itself. The front end is built with HTML/CSS/JavaScript that is connected to a Flask micro web backend framework. The working parts to the entire project are as follows: Librosa to manipulate, extract features, augment training data Pytorch/Pytorch Lightning to create the linear neural network (CNN WIP) Tensorboard to visualize the training progress and evaluate our metrics Flask for a simple front end that abstracts away the complexities of evaluating a genre h5py to create less bloated files that store mfcc arrays. pandas, numpy, scikitlearn for standard machine learning computation

Challenges we ran into

Machine learning and network training is not a simple task to challenge 3 people to learn and integrate in such a short timespan. Some of the biggest constraints for us were general troubleshooting and time management. Our first day was spent on normalizing our virtual environments and scrambling together videos/documentation we might need to study to work on this task.

Most projects used different feature extractions, different neural networks that abstracted much of the complications away, and generally inflated findings boasting a high training accuracy without noting much of the validation loss.

Once we got some code running we quickly ran into multiple roadblocks. Our training accuracy was low, our model was overfitting within the first 10 Epochs, the validation accuracy was stagnant at approximately 30%. Most inferences surrounding this project suspected the GTZAN dataset to be too minimal and outdated. At one point we were stagnant at a 50% accuracy rate for almost 30 trials and revisions. We assumed most of the accuracy failure was due to not using a convolutional neural network and limited to a dated dataset.

Accomplishments that we're proud of

One of our greatest accomplishment is our perseverance to improve the models predictions with what knowledge we had. After much digging we found a Free Music Archive dataset that was much larger (and presented equally challenging hurdles) for us to work with.

We can currently boast a 80% training and validation success rate in a model that prevents as much overfitting to the training data as possible. When tested with royalty free music, 9/10 predictions came back accurate!

Machine learning aside, we also challenged ourselves to build a simple front end with html, javascript and css that abstracts away much of the things non-data scientists dont care about.

What we learned

Preparing dataset for the neural net is not easy! It is a time consuming and tedious process. Cleaning, preparing, and balancing data is one of the most challenging aspects of machine learning that is not often spoken about. A model can only learn from the level of bias presented in its dataset.

What's next for Music Genre Classification

There's still so much to incorporate in this program. We want to build out a front end feature that allows the user to submit any data set. There is also the integration of data augmentation to the FMA dataset, as well as the medium, large, and full implementation of the model training.

We also want to upgrade the neural network to a convolutional model that may be able to support a wider genre classification.