PANTOMIME RHYMES

Our Initial Proposal:

Introduction

We want to create a Poetry Generator. This project will be heavily centered around Natural Language Processing. We ultimately decided on this project because we were more interested in natural language processing compared to structured image prediction, as well as an interest in poetry and writing in general.

The poems generated should be both cohesive in subject matter AND rhymes. There has been previous work to address each of these issues individually, but there has not been a successful deep learning model that can solve both and generate meaningful, rhyming poetry. While there have been some hard-coded attempts at this (see https://www.aclweb.org/anthology/W17-3901.pdf for example) they are not very flexible - they can only recombine words from a fixed corpora (generally much smaller than GPT's corpora) into stanzas of a fixed form, and are usually only vaguely cohesive in theme. GPT, on the other hand, can produce narratives with a clear structure.

Of course, many poems have no clear structure and their content may be obscured. This project just focuses on the subcategory of rhyming, structured, meaningful poems.

The problem of pursuing generating poetry using language models is as old as Deep Learning and the problem is becoming deeper year after year. After learning about the progress on OpenAI’s generative pre-trained transformer, as avid poets we wanted to explore how the model would perform given large corpuses of poems. By generating realistic sequences of rhythmic poetry, we can train the model to improve both the content and form of poems.

Related Work

https://openreview.net/pdf?id=Y5TgO3J_Glc: This paper describes an approach to generating poems with good high-level structure (i.e. rhyming schemes, meters, etc) using relational constraints. However, they note that the content of their poems is unreliable. They also explicitly note the potential for using GPT-2 or -3, in combination with ideas from their model, to produce poetry that is both meaningful and rhyming.

https://research.fb.com/wp-content/uploads/2017/06/automatically-generating-rhythmic-6-2.pdf This paper describes two methodologies for the automatic generation of rhythmic poetry in a variety of forms. The first approach uses an LSTM trained on a phonetic encoding to learn an implicit representation of both the form and content of English poetry. The second approach considers poetry generation as a constraint satisfaction problem where a generative neural language model is tasked with learning a representation of content. These models work fairly well but struggle in terms of rhyme and generalizability to novel content.

Subject extraction articles:
https://medium.com/@acrosson/extract-subject-matter-of-documents-using-nlp-e284c1c61824.
https://towardsdatascience.com/deep-learning-for-specific-information-extraction-from-unstructured-texts-12c5b9dceada.
https://www.gwern.net/GPT-2.
https://pypi.org/project/pronouncing/ ← cool for finding rhyming words.
Or https://github.com/jameswenzel/Phyme.

Data

https://www.kaggle.com/johnhallman/complete-poetryfoundationorg-dataset. https://www.kaggle.com/terminate9298/gutenberg-poetry-dataset.

The data that we are planning to use is the Gutenberg Poetry Dataset, which contains about 2.7 million rows of sentences extracted from hundreds of books from Project Gutenberg. Each line of poetry is also associated with the ID of the Project Gutenberg book that the line comes from. This can be used to find the title and author of a given line of poetry. We will need to extract each line and perform standard natural language processing to stem, unk, and tokenize the text. Based on the model performance, we may consider also using data from the poetryfoundation.org, which could help to improve poem generation. We will also use phonetic data from CMU (http://www.speech.cs.cmu.edu/cgi-bin/cmudict), which includes a pronunciation dictionary with letter-to-sound rules. This dataset will not require much preprocessing as the phonemes can be extracted directly from the dictionary; however, there may be additional work required for words not covered in this set. Using this data to produce rhyming schemes is done in the facebook paper from the related work section.

Methodology

Our model will combine several sub-models, each based on a different paper, combined under an overarching model. Each sub-model will be trained independently, and then be fixed while the top-level model is trained. The sub-models include the GPT model for generating meaningful text samples, a subject extraction model for enforcing content similarity in the generated poem, a phonetic model for generating rhymes, and a grammatical model for enforcing grammar.

The top-level architecture of our poetry generation model will be a Long Short Term Memory network (LSTM). The exact structure and hyperparameters will be determined through testing the model performance. For this model, we will also use a baseline of the pretrained weights from GPT-2, which is open source and publically accessible. We will train the model using the Adam optimizer to minimize the loss function, which will most likely be Sparse Categorical Cross Entropy loss. This will allow us to fine tune to the GPT-2 weights for poetry applications, specifically emphasizing the rhyming patterns and lyrical form while maintaining contextual meaning. In regards to the data, we will convert each line of poetry to its corresponding phonemes using the CMU pronunciation dictionary and train the model with this. We may also consider combining this with the rhyming dictionary to learn true rhymes and using morphemes. Thus after training, we will also need to make another function to project the phoneme combinations to the word they most likely represent.

For our reach goals, we may want to try implementing contextual rhyming - rhymes based not only on the character content of a word, but also on how that word is used. For instance, “I like to read” and “I just read a book” have 2 different pronunciations of the word “read," which don't rhyme.

Metrics

As a general strategy, We plan to run experiments using different sets of data (i.e. maybe a dataset that mainly has poetry from one author) and see how it ‘performs’. In this regard, we can use a held out set of poetry data, and see how good our model is at predicting the next word in the sequence. This includes words in the middle of a line to check for logical content generation and words at the end of a line to check for rhyming capability. We don’t believe that the traditional notion of “accuracy” applies for this project. Instead we would like to look at how similar the poems that we generate sound to actual poems written by humans (perhaps we can run tests by asking people to if they can tell which poem is machine generated and how similar they feel it is to a natural poem). The interesting thing here is that poems don't follow traditional patterns of grammar or sentence structure, so it will be to our advantage when crafting the poems. We would also consider doing an ad-lib style of checking if our model can generate correct words in poems it hasn't seen before or similar words. Our base goal is to trick someone using any one of the poems that it was not written with a machine, or that our model has some base perplexity level when trying to fill in words into poems. Our target goal is for at least 1/8 of the poems generated could trick a human, and our stretch goal is that ⅕ of the poems could trick a human.

Concretely, in determining our loss function, we will want to combine several different measures of poetry “goodness.” Firstly, we want to penalize the poem if it lacks rhymes - there are many ways to do this using a pre-trained phonetic model, and we will explore the options presented in the papers from the related works section.

Secondly, we want the poem to have a pleasant metered structure. The contentview paper in the related works section describes a way to do this.

Thirdly, we want the poem to be mostly grammatical.

Finally, we penalize the poem if its content is significantly different from the content of the text sample generated by GPT. This is the most important part - this makes sure the poem actually makes sense at some level. However, we may want to do more than just consider the final, complete texts of the poem and the GPT sample. For instance, we may want to enforce content similarity line-wise, with each line of the poem having similar content to the corresponding line of GPT text. We plan to implement hyperparameters to control this.

Ethics

One of the broader societal issues relevant to our chosen problem space, is the concept of intellectual property. For example, if our project works (or more likely a similar but much more polished project), you could train it in the style of an author and generate pieces that mimicked their style. This calls into question who owns these pieces of work. Is it the poet whose style is being mimicked? Or the developers who set the system up?

In terms of our quantification of success, we run into a major implication. This project is also relevant to issues such as generating text that is not distinguishable from human text and the relationship between that and fake news. So, if we use creating text that is not distinguishable as our metric we run into quite a few of the problems that GPT -3 could bring about. Here the implication is that if we build a model that prioritizes this, we run the risk of pushing the idea that hyper realistic models are the models that perform best.

Division of labor

Bashar - Work on Preprocessing, and run grid searches to help with optimization.
Luke - Calculate loss and improve the training process, integrate varying models.
Isaac - Implement a Long Short Term Memory network (LSTM) architecture, training.
Bilal - Identify large poetry corpuses, clean and manage data before inserting into the model.

Built With

gpt
python

Updates

Luke West posted an update — Nov 23, 2020 08:22 PM EST

Check-in #2:
Introduction
We will be creating a Poetry Generator. This project will be heavily centered around Natural Language Processing and developing a Seq2Seq model. We ultimately decided on this project because we were more interested in natural language processing compared to structured image prediction, as well as an interest in poetry and writing in general.

Challenges
What has been the hardest part of the project you’ve encountered so far?
Preprocessing two different datasets of varying styles proved to be more challenging than originally expected because content from the Poetry Foundation dataset needed additional cleaning to match the consistency of the content from the Gutenberg poetry dataset. Preprocessing and matching the style of the datasets for a unified one was frustrating to the point we were considering only using a single dataset. Ultimately, we decided to leverage both datasets for a more effective model!

We also took some time to pin down what we saw as the core of the project, the most difficult and interesting part of our idea. We decided that this was the process of converting some sample of text into a structured, rhyming poem with minimal loss of content. If we can do that, then we can be confident that the entire process of rhyming poem generation can be automated from scratch. This is because we know that GPT-3 can generate meaningful text samples from scratch. We would then be able to feed these sequences to our model.

Insights
Are there any concrete results you can show at this point? How is your model performing compared with expectations?

While there are no concrete results shown at this point, we are optimistic that our model will start to generate rhythmic poetry once we have completed preprocessing our poetry data and loaded in the phonemes dictionary. We have started to implement a basic LSTM model, however, we will not be able to test it or optimize it until later.

Plan
Are you on track with your project?
We may be slightly behind in our initial goals for this project, but we are looking forward to making a good amount of progress this week and more so once preprocessing is complete. It is difficult to build a good model when we cannot train it and test it to measure its performance, however, we hope to have a basic model implemented by the end of this week. We are excited to start to gather preliminary results, which we can then use to expand and improve our model.

What do you need to dedicate more time to?
We should probably spend more time meeting and discussing the specifics of our model implementation. We have been stuck at high-level discussions for too long, probably.

What are you thinking of changing, if anything?
As mentioned in the challenges section, we are dropping the GPT component of our project. However, we may use GPT-2 in our presentation in order to demo a complete poem generation from scratch.

Log in or sign up for Devpost to join the conversation.

Luke West started this project — Nov 13, 2020 07:27 PM EST

Leave feedback in the comments!