Bavarder

Who

Quinn Coleman and Blake Horne

Introduction

We will be attempting to create a chatbot capable of holding a simple conversation in French with the user while correcting French errors made by the user. The chatbot is intended to be used as an educational tool for users to learn and practice their French writing and reading. We arrived at this idea as we both have a background in French speaking and believe this is a good use of natural language processing in the real world.

Related Work

For our language model, we will be using FlauBert, a previously trained model based on the Bert architecture. FlauBert is a language model that was trained upon a large, heterogeneous French corpus. It was evaluated on text classification, paraphrasing, natural language inference, parsing, and word sense disambiguation. link

Data

In order to intentionally fine-tune our flauBERT model, we chose to use two different datasets. The first one is a corpus containing 616 conversations between adults and children up to the age of 7. The way this dataset will be used is by using the adult response to the children as the label for each of the children’s utterances. This way, the model will learn how to respond to the beginner level utterances that the user might submit to the chatbot. Secondly we will use a corpus containing conversations between French language learners that are tagged for their errors and the type of error which will be used to train the error detection/identification component of the model.

Methodology

For the chatbot, we will train two encoder-decoder models using FlauBert as the transformer. First, we will train with the conversational dataset as a sequence to sequence task. This model will thus be responsible for generating responses based on the users input. Then, the second model will train on the error detection as a classification set. The goal of this set will be to recognize any errors within the user generated input and then tell the user what their error was. If no error is found, then they will get a normal response generated by the other model. Then we plan to use Rasa to connect to the models and create a chatbot.

Metrics

For our metrics, we will assess our model based on perplexity for the conversational data, accuracy for the error correction data, as well as human evaluation of the model. Perplexity will assess the model for its ability to generate fluency and accuracy will test the models ability at recognizing errors. Then human evaluation will assess how appropriate a response is. A base target could be 20 perplexity and simple error recognition and mostly coherent responses, that may not be always appropriate. A strong target could be 15 perplexity and more appropriate responses. And a stretch could be 10 perplexity and nearly always coherent and appropriate responses.

Ethics

What broader societal issues are relevant to your chosen problem space? A potential broader societal issue in regard to our problem is the potential to reduce teaching positions with a program. Especially as this program could possible fail to pick up on conversational nuance as a person may be able to. Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The major stakeholders in this problem would be schools and educational companies who may want to use the chatbot in their lessons. Thus, the consequences of mistakes in our algorithm is improper responses that would reduce its ability to be used as an educational tool.

Division of Labor

For the division of our labor, since the project is essentially divided into the finetuning of two different models we decided to have each person be in charge of one of the two models as well as the accompanying data. Additionally, we will be responsible for tying them into the larger architecture.

Check in 3 link

Final Reflection link

Poster link