Emotionally-Aware Chatbot

Poster with higher resolution here

Final Write-Up

final write-up

Check in #2

check-in #2 write-up

Title

Emotionally-Aware Chatbot:

A conversational chatbot that can provide an appropriate and emotional response given a message (prompt) and an emotion category of the response to be generated

Group members

Aranav Baid (abaid2)
Nada Benabla (nbenabla)

Introduction

The paper we are implementing, titled Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory, aims to make a chatting machine that is not only consistent in grammar and context but also emotion. While there exist classifications that are able to process emotions, up to this point very few were able to model emotion on large scale conversation generation with satisfactory accuracy and emotion scores. This paper proposes a model (called ECM: Emotional Chatting Machine) that is arguably better than a regular seq2seq model (the most widely used in emotion recognition for chatting machines).

We chose this paper because it seemed relatively difficult, and we’re both interested in NLP and deep learning. In addition, it’s simply interesting to be able to model emotional conversations with a model that doesn’t completely understand them, and I think it can have implications on how we understand and process emotions.

The paper solves a problem that we can categorize into Classification and NLP.

Related Work

Another paper related to the one we are reimplementing (it's actually cited by the paper) is Annotating and Modeling Empathy in Spoken Conversations by Alam et al. Unlike previous iterations of empathic understanding, they segment empathic events from sentences, which performs better than the random baseline and performs the basis modal design that is then used in the paper that we study.

Existing implementations:

https://github.com/thu-coai/ecm (an implementation by the paper's authors)
https://github.com/bofei5675/ECM_NLU
https://github.com/AaronYALai/Seq2seqAttn_ECM/
https://github.com/loadder/ECM-tf
https://github.com/jenniferhe/Emotional_Chatting_Machine/

Data

We plan to use ParlAI’s EmpatheticDialogues dataset (https://github.com/facebookresearch/EmpatheticDialogues). This dataset consists of ~30k conversations (a conversation is made of a few sentences (prompt) followed by a response). Each conversation is categorized into an emotion (context). An example of a conversation with the same prompt and different responses:

context: ‘angry’ prompt: ‘I once lost my job and got mad’.  response:  ‘I lost my job last year and got really angry.’
context: ‘angry’, prompt: ‘I once lost my job and got mad.’ response:  ‘I am sorry to hear that. Did it happen out of the blue?’

We plan to preprocess this data, so that a conversation follows the following format (similar to the format outlined in the paper): [[[prompt, emotion tag], [[response1, emotion tag], [response2, emotion tag], ...], ...]

For simplicity, we might drop sentences whose word count is above or below a certain threshold (to be determined later).

Methodology

The paper argues that a traditional seq2seq model performs poorly when it comes to content and emotion scores, hence it proposes an end-to-end framework (ECM) that enhances the general seq2seq model. Thus, following the paper’s proposed model, we will base our implementation on the encoder-decoder framework of a seq2seq model, and our model will be implemented with gated recurrent units (GRUs). Since we are more familiar with LSTMs than GRUs we expect this part of the implementation to be relatively challenging.

Metrics

Given a particular prompt and an emotion category, our goal is to be able to generate an appropriate and coherent response that shows emotional consistency. We plan to define additional metrics as we go, but as of now, an initial step would be to sample text input, and see how the chatbot responds, and whether it picks up on emotions or not. Additionally, we will follow the paper’s guidelines and make use of perplexity scores to evaluate whether the model produces sentences that are relevant and grammatically correct (i.e. “evaluate the model at the content level”). We will also use emotion accuracy to evaluate the model’s emotional coherence (the emotion accuracy score will be “the agreement between the expected emotion category (as input to the model) and the predicted emotion category of a generated response by the emotion classifier.”)

Ethics

What broader societal issues are relevant to your chosen problem space?

Many, many of our ease-of-life innovations involve having to interact with people and understand their emotions. This involves conversational agents such as Siri and Alexa, in addition to customer service bots and even healthcare service replacements. As chatbots become more widely used today to automate repetitive tasks and answer FAQs, being able to understand emotion helps us get an extremely important aspect of our language across, which can help in all of these fields and more, including translation, education and mental health.

Why is Deep Learning a good approach to this problem?

Deep Learning is a good approach to this problem because there isn’t a cut and dry way to understand what emotion someone is having at a given time. We use context and other non-verbal cues to understand what emotions are present and are appropriate. Using Deep Learning to make a classifier based off of understanding previous conversations seems like a straightforward and efficient way to approach this problem.

Division of Labor

While we do not have a specific outline of the division of labor yet, we plan to equally divide the project’s tasks along the way.
Tentative task division (likely to change along the way): data preprocessing: Nada, model implementation: both, poster: Sahdiah, metrics implementation: Nada, results interpretation: Sahdiah, final writeup: both.

Built With

python