Knowledged-GPT2

Yuanqi Li posted an update — Nov 25, 2020 05:52 PM EST

Introduction

From what we have learned in class, we know that we could use GPT-2 to generate conversations, however, we have a significant problem that the GPT-2 could not generate good responses if we don't provide excessive contexts. GPT-2 usually tries to provide a vague and generic answer for your question if there is no context in users’ inputs. Our solution is to provide a "pre-process" method that transfers the input sentences given by human users to some more structured and detailed paragraphs, which could enhance the performance of conversation generation if we perform fine-tuning correctly. Our project is about transfer learning in the NLP area.

Challenges

What has been the hardest part of the project you’ve encountered so far? Currently, we have encountered difficulties with the model architecture design, fine-tuning GPT-2, metrics selection. After researching relevant knowledge-grounded conversation generation approaches, we proposed two ways to include world knowledge into conversational response generation. It’s hard to compare and the loss function design is different respectively. We are analyzing both to finalize the model architecture design by this week. The input of the self-attention block of the GPT-2 model is the sum of three types of embedding: word, dialog-state, and positional. The word embedding is constructed by concatenating the knowledge sentences with a history of the dialog’s previous utterance. Along with positional embedding, they are learned during the pre-training phase. The dialog-state embedding, which is learned through fine-tuning, is used to indicate whether the current token is part of (i) a knowledge sentence, (ii) an utterance from PERSON1, or (iii) an utterance from PERSON2. Firstly, learned TF-IDF vectors for the ground-truth response at last turn and knowledge candidate sentences. Then, select the knowledge sentence which argmax the TF-IDF score. The selected sentence and dialog history sentences are passed into a model (e.g., Transformer encoder) respectively, the outputs are concatenated and passed to GPT-2. Apart from that, we have not found if we could ask GPT-2 to perform as a chatbot directly. There might be something we still need to fine-tune and handle the interaction process manually, and we still need to do things on our own. Another challenging part is metrics selection. We found that there are various automated metrics that we have to choose from: perplexity, F1, n-gram diversity, Hits@1, and so on. Besides, we’re not sure if the human evaluation is practical in this course project, considering the huge size of test data and subjectivity.

Insights

Are there any concrete results you can show at this point? We have preprocessed the data and generated five different sets of the data: train, validate_freq, validate_rare, test_freq, test_rare. All of the data set consists of three files: dialog history with different lengths, the corresponding ground-truth response for each dialog history, the most relevant knowledge sentence for each ground-truth response. We are working on implementing a basic chatbot using GPT-2, which will not include knowledge and perform as our baseline.

Plan

Are you on track with your project? Yes, we've extracted conversation messages, knowledge data from original JSON format TCS DATASET into txt format data, discussed different model architectures, and worked on the implementation of GPT-2 chatbot. What do you need to dedicate more time to? We guess we might need more time for handling how we process our data and finalize the design of input representation. And we might need more time on gaining some GPT-2 experience: the model is not as easy to use as we originally think. As mentioned in the Challenge part, we will dedicate more time to implement a basic GPT-2 chatbot for further improvements. What are you thinking of changing, if anything? Based on model architecture design, we might change the way we handle the pre-processing and input representation. For example, we may also need to provide extra information not only based on the keywords of the inputs but also some personality information et cetera, especially for very short inputs.

Log in or sign up for Devpost to join the conversation.