Blueno Mars: Lyric Generation Poster

Blueno Mars

Who

Cecilia Vogler, cvogler

William Buerger, wbuerger

Star Su, ssu6

Introduction

For our project, we would like to implement a model that will be able to learn and generate song lyrics with the appropriate wording, line breaks, and verses. We all have a strong interest in natural language processing, so we wanted to implement a creative twist on a simple sentence generation model. Song lyrics have a very specific structure in comparison to regular writing, so we hope to be able to mimic this with our model. As our base goal, we want to be able to generate a few lyrics with some structure that reflects a real song. For our target goal, we hope to be able to generate a complete song, or autocomplete lyrics given some words, and as a stretch goal we wanted to explore the idea of generating songs of specific genres or in the style of particular artists. This project will be a problem of structured text learning and generation.

Related Work

In this blog post (https://towardsdatascience.com/how-to-build-and-deploy-a-lyrics-generation-model-framework-agnostic-589f3026fd53), the author used the Genius API to scrape lyrics and create a lyric generation model. Their model was focused on rap lyrics, so their dataset was smaller and they had to use data augmentation. Although their post wasn’t very technical, they used a custom generation function that we can use as prior art.

In this blog post (https://towardsdatascience.com/generating-beatles-lyrics-with-machine-learning-1355635d5c4e), the author explored using GPT-2 to train their model, which is something that we plan to use.

This paper (https://www.aclweb.org/anthology/C18-1174.pdf) proposes a sophisticated architecture to tackle the task of detecting the macrostructure of songs, using what is known as “self-similarity matrix representation,” (SSMR) which combines metrics like phonetic similarity and string similarity to compare words. The architecture is based on convolution over the SSMR, then pipelined into LSTM cells.

Data

We plan to get our data from the Genius API, which has a variety of different songs and artists that we can access. The api has millions of songs and annotations that we plan on using to train our model. Our biggest task will be preprocessing, since we will need to go through the lyrics and make it uniform and easy for the model to process. Some ideas we had for preprocessing include creating stop tokens for the ends of lines and verses. We hope to identify different patterns to eliminate, so our model doesn’t just train and get stuck in one particular song pattern.

Methodology

We are training our model using a RNN, on pre-trained word embeddings like word2vec or GPT-2. For our architecture, we plan to use something similar to what we built in the language modeling assignment. We plan to use Keras to create our layers. We will experiment with the number of dense/feed-forward layers, as well as the parameters to our LSTM. From our research into prior work, almost everyone uses RNN so this architecture seems like a good place to start for our model.

Metrics

Since qualifying a generated song lyric is difficult to quantify, we plan on using other methods of gathering metrics for our model that may better classify it as successful or unsuccessful. One idea we had was to make a survey, polling people if they can distinguish which lyrics are from a real song and which ones were generated with our model, and see how often they can distinguish the two, or how convinced they are by our lyrics. Our base goal will be to generate a few lyrics that we can compare to real ones and that will seem like real lyrics to other people. Our target goal will be to generate a complete song, which, even if it is not as artistic, could be convincing as real song lyrics. Our stretch goal is to be able to generate convincing imitations of songs in the style of certain artists or of certain genres.

In terms of concrete metrics, we plan to use perplexity by comparing training and testing, to see if it’s learning.

Ethics

What broader societal issues are relevant to your chosen problem space?

Automating any sort of process using deep learning always has an impact on society and can cause issues in relevant industries. For a model that generates song lyrics, there is a risk in its impact on the music industry. Song lyrics are a big part of what make different musical works unique and distinguish artists from each other. Automating song lyric generation carries the risk that all songs will begin to sound too similar, or will lose the personal touch that often makes them so popular and so emotional. If the model were to work well, it would also render the many artists who write song lyrics useless, as people might turn to automatically generated songs. Another potential impact is that if everyone has access to a model that writes song lyrics for them, it might make it harder for talented young artists to distinguish themselves with their music from the rest of the crowd.

Why is Deep Learning a good approach to this problem?

Deep learning allows for unsupervised learning, mimicking the behavior of the human brain in that models learn from data and then use that knowledge to make decisions, generate text, recognizing speech, etc. This is perfect for a problem like this, because song lyrics have a pretty set structure, even between different genres. They usually consist of short lines, broken up with line breaks, and sometimes they are also broken up into verses. Even as humans, in order to write songs we look at different music and subconsciously learn the structure of different songs so we can mimic it ourselves. Similarly, in order to generate song lyrics, our model will have to learn the structure and content of songs from a dataset, and use that in order to generate its own lyrics.

Division of Labor

We will be working together for the majority of this project. For example, we believe preprocessing will be a big part of our project so we ideally will all be working on that.

Cecilia Vogler: Preprocessing, digital poster, training/testing model

Star Su: Preprocessing, scraping api data, visualizing output

William Buerger: Model architecture, oral presentation, testing, tuning hyperparameters