Poetry Generator with Text Image

Team members: Sophia Song(xsong2), Yuming Fu(yfu49), Yongxuan Fu(yfu13)

Team Name: The Powerpuff Girls

Report

Final Report

check off 1

check off 2

Poster & Slides

Poster

Presentation Power Point

Code

Github

1 Introduction

The beauty of poetry as a form of expression has been increasingly neglected as we enter the era of technology. The enormous amount of screen time we have has blocked out the time we used to have for reading poems or even writing one on our own. People’s loss of interest in poetry, as well as other classical forms of literature, indicates a hole in our STEM-oriented education system, and action needs to be taken to preserve the essence of our literary traditions.

The purpose of our project is to explore the possibilities of restoring popularity in poetry through making poetry more accessible, especially its creation process. While the concept of ‘writing a poem’ sounds especially intimidating, we have all been through a situation when we have a certain text image or sensation in mind, just not able to extend them into long sentences with words of grace. Therefore, our group aims to make the process of poetry creation as easy as uploading a text image, and our model will extract its features to generate a few topics upon which a beautiful poem is generated.

While our project is just the start of intelligent machine-created poetry, we hope to keep track of continuous development in the field including poems in different languages that can raise people’s interest in poetry and make it a shared treasure among all humankind.

The problem we plan to solve is a structured prediction. With provided photos, we want to form logical sentences that fit basic English grammar on a given topic. Since we are planning to use the RNN with the LSTM model, our project is unsupervised learning. Also for a specific input text image, the output poems expected have so many possibilities.

2 Methodology

The brief of the process:

Analyzing and Recognizing the text from text image: We used deep text recognition benchmark model provided by Jeonghun Baek to recognize the text from text image[1]. By implementing this model, we can get a prediction text that is extracted from the text image.
Preprocessing the data: We used two poem datasets from Kaggle[3]. One dataset contains over 36,000 modern poems and they are all assigned into different topics. Another dataset contains 15,700 sonnets which are retrieved from famous poets. We combine these two datasets together to improve the generated poem’s readability. Due to the limitation of memory, we break our dataset down into 142 topics, such as love, death. By analyzing the text from the input text image, we find the most relevant topic related to the text and only use that topical poems for training the model. Lastly, we tokenized and generated sequences. We split the poems into words and pad each line in the poems to make them have the same length.
Word-embedding: word-embedding is a necessary technique in text generation as it provides a way to represent text in a numerical way. While there are various approaches to create pre-trained word embeddings, our model has adopted the Stanford GloVe, which is ‘Global Vector for Word Representation’.The purpose of GloVe is to learn word vector in a way that their dot product equals to log of the words' probability of co-occurrence. Since the logarithm of a ratio is the same as the difference of logarithms, this method connects the log ratios of co-occurrence probabilities with vector differences in the word vector space.
Implementing Seq2Seq model: Model: Our model is mainly dependent on the seq2seq structure, where we use 2 RNN models - LSTM and RNN to generate poems. A single unit of an encoder takes a single word as an input, processes it and outputs a vector of hidden state. The decoder takes the final hidden state from the Encoder and a ‘start of sentence token’ to feed to a dense layer with the Softmax activation. We select the word with highest probability, and the output word is fed to the next cell to repeat this process. The reason why we choose the seq2seq model is that It is better than an ordinary RNN because it can generate output of different lengths, so we can improve the flexibility of our poem. Also, it can increase the connection between sentences in a poem because we pass a summary of the entire sentence as context to the decoder unit.
Training the model and Generating a Poem: Based on the text from the text image we recognized on the step 1, we generate each line of poem by using our model.

3 Metrics

We hope to generate poetry with a provided image that is ‘intelligent’ in a way that is grammarly correct, cohesive, and conveys a basic concept. However, there are not many quantitative ways to rate a poem, thus, the evaluation will be mostly subjective.

Since there is no such accuracy to evaluate poetry generators, we plan to adopt the evaluation framework[2], in which human annotators are asked to evaluate poems on a five point scale with regard to a number of characteristics, viz:

(1) fluency: is the poem grammatical and syntactically well-formed?

(2) coherence: is the poem thematically structured?

(3) meaningfulness: does the poem convey a meaningful message to the reader?

(4) poeticness: does the text display the features of a poem?

Additionally, we ask annotators to judge if the poem is written by a human or a computer. We have selected the three most representative poems generated using our model and collected results from annotators via Google survey.

4 Result

Please check out our final report link above to see the images and result statistic images

The sample poem that generated by the text image- Love:

Cover both moon and die, to melt.

prophecies, thunder mouth as wept.

drouth: glowing dross; pain: reserve days,

tempests malcontent, some woman prison'd in more rage.

sheathed farewell wouldst his plenty press’d, i drown enemies;

more, bravery heart’s scythe to enchant fatal entrance of prove.

We have a response sample size of 36 from Google survey, and out of the 3 poems that we have selected to be in the survey, poem No.3 has the highest overall rating regarding grammar, syntax, coherence, and topical meaning. The most frequent rating given is “satisfactory”, which shows that our poems have decent performance with room for improvement. Meanwhile, our model performs better regarding topical meaning in comparison to grammar and syntax. We consider it a potential result from our training dataset, which is partially formed by categorized topical poems. The lack of well-formed syntax and grammar might be related to a lack of training in daily conversation from the seq2seq model. Surprisingly, nearly half of the people thought the poem was written by a human. We think these results indicate that we achieve our target goal that enhancing the poetry’s readability and making sentences look like they were written by humans.

5 Challenges

Finding the ideal dataset for the training of our model is relatively difficult. Based on the structure of our model, a dataset that is too large (over 1 million lines of the poem) can easily go out of memory on our machine as well as available cloud resources. If we shrink the dataset to a size that is too small, there will not be enough training data and therefore result in a lack of readability of our code since it needs to learn the grammar structure. In addition to the size of the dataset, we classify the poems into different topics, such as happy and hate. Then, we generate poems by training our model based on poems with specific topics. In the beginning, we tried to select sonnets from William Shakespeare, yet the model-generated poems became too arcane.

Finding papers discussing the generating of hidden poems or poems by given topics or text images. When we tried to solve a new problem based on those authors’ implementation, we found their codes were turned into complicated, difficult and hard to follow pieces. We did find a paper discussing automatic generating of hidden poems and ancient poems by RNNs cell in LSTM in python. But the implantation parts just support Tensorflow 1.x, which contains abstract objects only in Tensorflow 1.x and are hard to understand.

Once we have settled on the appropriate model methodology, understanding what is inside the “blackbox” of the seq2seq model in sentence generation becomes our primary challenge. We have to do quite some research to understand the importance of selecting an appropriate query word for our poem generation and the influence of training data on our outcome.

6 Reflection

We have finished our base goal which is to generate new English poetry with elements in the image. And we did our best to reach our target goal which is to improve the poetry’s readability and make sentences look like they were written by humans.

Overall, our model did a decent job in generating poems, and its training speed is faster than we expected (training completed in about less than 40 minutes). There are mainly two problems, however, that we noticed in generated poems. First, the model produces one line after another, and each line generated is included in a sequence that we take into the generation function as a part of the context. However, the model has a tendency to generate repeated sentences given prior context, and we are yet to find a solution for this bias. Second, since our dataset is quite inadequate outside of topical poems, our model still has room for improvement in grammar coherence which may require a larger pool of literature data for training purposes.

The biggest change we made over time is how we would like to introduce the element of an image to our poem generation. Our initial plan is to generate poems by scenes, where descriptive words or sentences can be generated for elements of the poem. During our research processing, however, we found that it is quite complicated to detect those features, which require a Visual-Poetic Embedding Model, a generator, and discriminators. Considering our limited time, we changed describing the image to detecting the text in images and generating corresponding poems. If we have a chance to redo our project, we agree that we would research on the ML topics which have enough datasets that can be used.

Another improvement that we made is on our training dataset - established poems. We realized that the fluency, meaningfulness, and readability of generating poems highly depend on the size of the dataset, as the readability of our generated poem increases as we enlarge its source of training. For example, if we limit the training source to be on poems with a topic in “loneliness”, it would not be as readable as those generated after combining other negative sentiment topics including “darkness”. Therefore, we restructured the dataset and added in an additional 20,000 poems categorized into our original dataset. After increasing the size of training poems by 120%, the syntax and meaningfulness of our poems improved significantly.

There are definitely target goals that we wish we could achieve in addition to our base goal and target goal. For example, we think that we can do more on improving the beauty of poetry. We could potentially implement a neural language model trained based on phonetics of words and learn an implicit representation of both the form and content of English poetry.

We are grateful for this project experience and have some wonderful takeaways. Throughout this journey, we realized that human language arts are far more subtle and complicated than we imagined - it can be hard to fully eliminate the differences between human-written and computer-generated poems through language models. Meanwhile, we found out that collecting data and preprocessing those data to have it catered towards a model, while seemingly mundane, can make a true difference in our model and is worth researchers’ attention. There is an important tradeoff between desirable results and chosen dataset size, and we believe that many researchers face the same difficulty deciding between efficiency and accuracy. At the beginning of the project, our model easily went out of memory due to the excessive data.

However, outcomes were certainly undesirable when we were trying to shrink the dataset. We made quite some efforts restructuring the training data to find a balance to successfully improve the generated poem’s readability and reduce the training time.

7 Division of labor:

Sophia Song: Model training/testing for text image

Yuming Fu: Model training/testing for text image

Yongxuan Fu: Analyzing and Recognizing the text from text image, Dataset collection (including labeling) for text/poems

8 Citation

[1] Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019, December 18). What is wrong with scene text recognition model comparisons? dataset and model analysis. arXiv.org. Retrieved December 9, 2021, from https://arxiv.org/abs/1904.01906.

[2] Cruys, T. V. de. (n.d.). Automatic poetry generation from prosaic text. ACL Anthology. Retrieved December 9, 2021, from https://aclanthology.org/2020.acl-main.223/.

[3] Poems Dataset (NLP), https://www.kaggle.com/michaelarman/poemsdataset