Share this project:

Updates

posted an update

Project Check-in 2

Introduction: Upon formation, the group expressed an interest in using Natural Language Processing to actually write a chapter, or more, of some form of literature. After discussion, we thought science fiction would be the most interesting to apply the deep learning model. Having learned of the impressive GPT-3 model of OpenAI in class, we decided to implement GPT-1. Thus, the objective of this project is to implement the GPT-1 architecture, training with specific intention of later attempting to write science fiction.

The original paper is interesting because of the “semi-supervised” architecture that the Open AI team uses to create a good NLP task-agnostic model for their goals. It builds off of the Transformer architecture and uses 12 Transformer-Decoder layers for the Generative Pre Training (GPT) that it does. We are interested in re-implementing the GPT architecture and training it on a specific corpus, rather than a large contextual corpus as in the paper, to see if this will better align with our goal. Our goal is to train GPT on a large corpus of Sci-Fi works to see if we can use it in Natural Language Generation to write a coherent Sci-Fi short story.

Challenges: So far, the preprocessing has been the most challenging aspect of the programming portion of the project. We have downloaded the corpus we intend to use for the training of the GPT model, but the preprocessing of this text has been interesting to consider. We are basing the preprocessing on that of homework 4, but unlike that project, this corpus is not at all labeled or standardized. The length of sentences varies more than hw4 and the corpus has no standardization or tokens provided, so we have been brainstorming how to effectively accomplish this manually.

Additionally, the larger overall challenge has been the allocation of time to work on the project. Since classes have been ongoing, we have all been rather busy, resulting in some procrastination. However, we anticipate that to resolve now that classes are ending and we are entering Thanksgiving Break. As such, we will strive to make as much progress as reasonably possible in the next week. Depending on what then remains, we will set incremental deadlines to ensure we are ready for Deep Learning Day.

Insights: Currently, there are no concrete results to show. While we have implemented aspects of each major component of the project, no component is yet finalized, thus we cannot yet run the model.

Plan: For our own plan, we are relatively on-track since we intend for the brunt of the implementation to be done over the coming week. However, we have not yet met the track provided in the final project handout. As we could conclude prior to reviewing the handout’s plan, we need to focus more of our time on the preprocessing of our corpus. The preprocessing is likely the most important and challenging component of our model, so we hope to prioritize this and finalize its implementation soon. Past that, we are satisfied with where we are in terms of our model’s initial implementation.

Perhaps of note, we have been considering the size of the dataset and the success the model may have training on it. In OpenAI’s model, there was pre-training and fine-tuning. While our model does differ from OpenAI’s due to the sole science-fiction application, we wonder if the fine-tuning might still be applicable to further refine the model by introducing some more specific corpus. Before deciding whether to add this, though, we will finish implementing our model.

Log in or sign up for Devpost to join the conversation.