UALT

Final reflection submission:

Title: Unified Artificial Language Translators

(Comes from UA - Ukraine, and LT - Lithuania)

Who:

Bohdan Karavan (bkaravan) Julia Stepanenko (ystepane) Ignas Karvelis (ikarveli)

What problem are you trying to solve and why?

The problem we are addressing is the generation of poetry using machine learning, specifically focusing on the works of William Shakespeare. The goal is to create a model that can generate text that mimics the style, meter, and rhyme of Shakespeare's poetry. This is an interesting challenge because it combines the need for linguistic precision with creative expression, pushing the boundaries of what artificial intelligence can achieve in terms of understanding and generating human-like text. This project is not only technical but also creative, as the model must capture the uniqueness of Shakespeare's language.

Paper’s objectives

We are implementing an existing paper “Syllable Neural Language Models for English Poem Generation”. The paper we chose addresses the intriguing challenge of poem generation using machine learning. All of us are very interested to see how technology can replicate such a prevailing human-heavy skill.

The primary objective is to create a neural language model that can capture the unique linguistic style and semantic essence of a poet's work, focusing on the English poet William Wordsworth. Our project will be using the works of William Shakespeare. The authors aim to overcome the limitations of data scarcity by employing a multi-stage training process that leverages both poetic and non-poetic works, as well as large corpora, to learn the syntax and grammar of the language.

Our reasons for selecting this paper include: Cultural Significance: One of the sources that inspired the paper is a similar model based on the Italian writer Dante Alighieri, a central figure in Italian literature. As international students, we are interested in trying this approach on literature familiar to us. Additionally, this paper addresses a new approach in poem generation (syllable-based approach), potentially meaning that we can create one for our native tongues.

Research Implications: The findings suggest potential future directions for integrating more structured models that can incorporate additional author-specific information, which could enhance the emotional depth of the generated poems.

We chose it because of our desire to explore the intersection of artificial intelligence, literature, and cultural heritage, highlighting the potential of modern technology to engage with and preserve the richness of human artistic expression.

What kind of problem is this?

The problem described in the paper we have chosen is related to Natural Language Generation, which is a subfield of Natural Language Processing or NLP. Specifically, our task of poem generation can be categorized as a type of structured prediction problem.

The model needs to produce sequences of text that follow a specific poetic structure and style, which requires capturing dependencies between syllables, words, and lines to maintain coherence, rhyme, etc.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

Our source paper also mentions another work: “Neural Poetry: Learning to Generate Poems Using Syllables” which can be helpful to look into when implementing our project. Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching. One relevant paper is "A language generation system that can compose creative poetry" by researchers at the University of Colorado and Drury University.

This study presents a language generation system that produces creative poetry verses, utilizing a fine-tuned adaptation of GPT-2, a pre-trained language model developed by OpenAI. The system's ability to generate poetry showcases the potential of machine learning models to engage in creative tasks that were traditionally considered exclusive to human intelligence. The research highlights advancements in natural language processing and the application of deep learning techniques to artistic expression. https://techxplore.com/news/2020-02-language-creative-poetry.html

Public implementations

Actual paper: https://computationalcreativity.net/iccc21/wp-content/uploads/2021/09/ICCC_2021_paper_31.pdf

Syllable Poetry Generation model code: https://gitlab.com/danielle.evalewis1/poetrygeneration_william_syl-worth

Data: What data are you using (if any)?

We will be using a complete dataset of works by Shakespeare from MIT.

How big is it? Will you need to do significant preprocessing?

The dataset claims to be a complete dataset of all the works. It's a substantial text file. From the description, it is often used for training language models due to its rich linguistic content. The size of the dataset is large enough to provide a good training ground for natural language processing tasks, but not too large as to be unmanageable for most computational setups.

Preprocessing will be the bulk of our work. We will need to do some standard text cleaning. For instance, it could be helpful to remove special characters and possibly segment the text into more manageable pieces for processing. We might also want to convert text to lowercase, or remove stop words. To work through poems and plays equally, we would need to standardize plays and remove some of the irrelevant for poem generation information. Given the nature of Shakespeare's language, which often includes archaic phrases, preprocessing will be needed.

Methodology: What is the architecture of your model?

The architecture of the described model consisted of a few main points: Hyphenation module: this module is used to split the corpus of the words into sequences of syllables. That way, the model will be working with learning syllables rather than whole words, which makes sense in the context of poem generation.

The language model: an LSTM recurrent neural network to predict the next token in the sequence. The input to the LSTM is the embeddings of the syllables from hyphenation module, and the general structure of the LSTM is familiar to what we did in class and in homework4.

After the learning, the model can be prompted to generate verses that simulate the chosen author (for the paper, William Wordsworth). For the best results, poems are generated using multinomial sampling with temperature and top-p sampling. After the verses are generated, they are measured with metrics like perplexity and custom scoring functions that determine the grammar, style, and poetry sides of the generated text. The best poems, according to these metrics, are then evaluated by humans.

How are you training the model?

The big idea of the paper is to use Multi-Stage Transfer Learning: since the paper is trying to copy the work of a particular author, the corpus of inputs is rather small. Having a small initial dataset leads to poor generalization of poem generation, which is why the authors of the paper come up with multi-stage transfer learning. In a nutshell, rather than just using one “training” dataset, the authors go for a text in prose to learn the rules of the language, then learn a set of poems from the author, and then calibrate and fine-tune the results on a big work of the author called The Prelude.

Hardest part of the implementation.

So far, the hardest part of the model seems to be to combine the pieces together. On their own, the parts don’t look too bad: some preprocessing with hyphenation, mostly the same LSTM, and training on a few datasets. However, since we have not had practice with transfer learning yet, it might pose challenges, just like making sure that everything flows into each other nicely. Lastly, the part of actual poem generation is still a little confusing, since it is hard to understand the abstraction behind how well the model learns the poetry patterns given a vocabulary of syllables.

Metrics: What constitutes “success?”

Perplexity: A lower perplexity score indicates that the model is better at predicting the next word or syllable in a sequence, which suggests a better understanding of the language structure. Human Evaluation: The believability of the generated poems as judged by humans, especially experts in literature or poetry, can be a strong indicator of success. Stylistic Accuracy: The extent to which the generated poems match the style of the target poet, in terms of vocabulary, rhyme, meter, and other stylistic elements. Semantic Coherence: The generated poems should make sense and convey coherent ideas or emotions. Novelty and Creativity: The ability of the model to generate original content that does not simply replicate the training data.

What experiments do you plan to run?

The main experiment would be giving people/experts in literature samples of Shakespeare's actual work and the samples of what our model generated and seeing whether they can spot the difference. If they can’t or have trouble doing so, our model is doing well!

Accuracy metrics

Since this is a Natural Language Generation task, metrics like perplexity make sense and will be applied to our results. In addition, there will be a few scoring metrics that will capture different aspects of the generated verses: based on meter, style in grammar. Basically, we want to make sure that our poem actually has a poem component to it, it follows the style of our selected author, and it makes grammatical sense. Lastly, the best-scoring verses will be evaluated by humans, perhaps even with a test where people would need to guess which one is William Shakespeare (the poet we are targeting as a group) and LM generated.

Quantified results of the paper

The authors of the paper "Neural Poetry: Learning to Generate Poems Using Syllables" were exploring the potential of machine learning models to capture the linguistic features and semantics that characterize a poet's style. Their goal was to create a syllable-based neural language model capable of generating poetry that could be perceived as authentic and stylistically similar to the works of a target author, in this case, the English poet William Wordsworth.

To quantify the results of their model, the authors employed both quantitative and qualitative measures:

Quantitative Analysis: They used metrics such as perplexity to evaluate how well the model predicted the next syllable in a sequence, which is indicative of the model's understanding of the language structure.

Qualitative Analysis: They conducted human evaluations involving both general judges and expert judges (literature graduate students) with a background in humanistic studies. The judges were asked to determine whether the generated samples were authored by the poet or LM. The success of the model was measured by the frequency with which the generated tercets were considered real by the judges, as well as the extent to which expert judges recognized the poet's style and rhymes in the generated text.

The authors reported that the generated tercets were frequently considered to be real by a generic population of judges, with a relative difference of 7.4% compared to the ones actually authored by William Wordsworth. This indicated that the model was successful in generating poetry that could often pass for the work of the target author, at least to non-expert judges. The expert judges' perception of William Wordsworth’s style and rhymes in the generated text further validated the model's effectiveness in capturing the poet's unique style. The combination of these methods provided a comprehensive assessment of the model's performance in generating poetry.

What are your base, target, and stretch goals?

Base, Target, and Stretch Goals: Base Goal: Implement a basic character-level RNN that can generate text resembling Shakespeare's writing style. Target Goal: Enhance the model to capture the style and rhyme scheme characteristic of Shakespeare's poems. Stretch Goal: Refine the model further to generate full sonnets that are indistinguishable from Shakespeare's work to a layperson. Potentially be able to extend the algorithm to other authors and other languages, styles, etc.

Ethics:

The broader societal issues relevant to the problem space of poem generation using machine learning include the preservation of cultural heritage and the potential displacement of human creativity. The use of deep learning to generate poetry can help preserve the linguistic styles and cultural significance of historical poets like Dante Alighieri. However, it also raises questions about the value of human creativity when machines can produce similar works. Luckily, we are using the works of a poet, whose works are now in the public domain, so we can use the source without issues but it of course doesn’t get rid of the potential problem of unintended plagiarism and similarity issues. There is a delicate balance between using technology to celebrate and extend human artistic achievements and the risk of undermining the unique human touch in creative writing.

Regarding the dataset, which in this case is the complete works of William Shakespeare, concerns about historical or societal biases are pertinent. Shakespeare's works reflect the language, values, and societal norms of the Elizabethan era, some of which may be considered outdated or offensive by today's standards. It's important to acknowledge these biases when using the dataset and to consider the implications of perpetuating certain themes or language in generated content. The major stakeholders in this problem include literary scholars, educators, students, and the general public who might interact with the generated poetry. Mistakes made by the algorithm, such as generating content that inadvertently perpetuates negative stereotypes or biases, could have consequences for these groups, affecting educational outcomes or reinforcing harmful societal narratives.

Quantifying success in poem generation can be done through metrics like perplexity for language models, human evaluation for stylistic accuracy, and semantic coherence. The implications of these quantifications are significant because they influence how we understand and value machine-generated content compared to human-created works. They also impact decisions about the deployment and use of such technologies in educational and cultural contexts. It's crucial to approach these metrics critically and consider their limitations in capturing the full scope of poetic quality and creativity.

Division of labor

We are a pretty small and young group, so none of us have any particular specializations yet. As for now, we will be working collectively and split up the work as we go along the project.