Elizabeth Chen posted an update — May 02, 2022 05:46 PM EDT

Update - Final Project Reflection

May 2, 2022

Introduction

We are trying to implement a Deep Learning system which can automatically detect the emotion of a given piece of music. Specifically, we will classify the given data into four mood classes {happy, angry, sad, relax}, making use of both the audio signal and the lyrics. We decided on this topic as all of us are initially interested in music processing with DL and the way that music causes emotional arousal to the listener attracted us. We also consider a classification task could be a feasible one for us to achieve success. In reference to this paper Multi-Modal Song Mood Detection with Deep Learning, we plan to first implement the model which will achieve the expected outcome. Next we would like to extend the model by adding more spectral features and rhythm features such as tempogram and the Fourier tempogram. In addition, trying to overcome some drawbacks mentioned in the paper, we are also considering collecting more data with the Spotify API and modifying the structure of the model to achieve better accuracy.

Challenges

Most of the work put into this project so far has been obtaining the different datasets required for training. This required writing two custom web scrapers — one for downloading song lyrics and one for getting sound files — the legality and effectiveness of which were hotly debated amongst our team.

The lyrical scraper was written first, and it turned out to be the more challenging of the two. In the paper that provided the MoodyLyrics dataset, where our songs and labels come from, the authors did not provide the lyrics for each song directly, but rather suggested a way to scrape the lyrics for each song from LyricWiki. We prepared to implement this ourselves, but then found, to our dismay, that LyricWiki was shut down in 2020, throwing that possibility out the window. We then turned to scraping from Google, but they disallow requests that do not come from their web page. As a last resort, we turned to Genius, building URLs for all 2595 songs in the dataset and hoping that they would be associated with a registered set of lyrics on Genius. This mostly worked, although there are still a small set of lyrics missing.

The challenges that came with extracting 2595 separate audio snippets mostly revolved around the time needed to download them from YouTube using the youtube-dl command line tool. YouTube unfortunately throttles download speeds significantly, so each song took about 1-2 minutes to download. Luckily, this process was parallelizable, so we overcame this challenge by spawning 10 separate youtube-dl crawlers, speeding up the procedure by a factor of 10. Still, the effort took several days in total, but we now have every single song of the 2595 in the dataset.

Insights

We do not have any concrete results yet as we just finished downloading all of the audio files over the weekend. We are looking forward to working with the data and gaining interesting insights in the coming days.

Plan

Going forward during the next nine days our team plans to dedicate time to preprocessing our audio data. This will include creating spectrographs for all the audio files we have. Up until this point much of our work has been in scraping the web in order to get audio samples and lyric samples for songs with labels of their mood. Now that we have acquired this data it is time to preprocess this data such that we may extract the most information from the data in order to classify mood correctly. This means creating spectrographs which will show Fourier analysis of song snippets aka 2D representations of our data. CNN’s will be used on this data to extract information on mood during training and testing. The second set of data comes from lyrics in which a vec2vec approach will be used to determine mood of songs. Our third set of data will be a tempograph that is created from our .wav files. This will give tempo based information on our dataset. Again a CNN will be very useful for using this data for mood classification. We feel that this plan going forward will allow us to be successful.

Log in or sign up for Devpost to join the conversation.