Sarcasm Detection: Best Deep Learning Project Ever Created

Final writeup: https://docs.google.com/document/d/1XA9SFcKcbifXjdQ5Rzm7WGLfEEsPSySZxbpSTALUdzc/edit

Title: Summarizes the main idea of your project.

Sarcasm Detection: The Best Deep Learning Project Ever Created

Who: Names and logins of all your group members.

Mark Miller - mmille79 Rahel Selemon - rselemon

Introduction: What problem are you trying to solve and why?

We are implementing the paper “A Dual-Channel Framework for Sarcasm Recognition by Detecting Sentiment Conflict,” which outlines a classification model that assesses whether statements are sarcastic or genuine. We chose this paper because recognizing sarcasm seems like a complex human skill, so the idea of a machine being able to identify uses of sarcasm is intriguing and clearly demonstrates the abilities of NLP deep learning models.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching. In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”–if you stumble across a new implementation later down the line, add it to this list.

Paper’s implementation: https://github.com/yiyi-ict/dual-channel-for-sarcasm

https://aibusiness.com/nlp/sarcasm-is-really-really-really-easy-for-ai-to-handle

This article is highly skeptical of AI's ability to detect sarcasm, citing some failed attempts to build sarcasm detectors in the past. It also describes sarcasm as an "inherently human thing" that cannot be picked up on by analyzing text alone. Our project seeks to challenge this claim.

Data: What data are you using (if any)?

Our dataset can be found at https://paperswithcode.com/dataset/isarcasmeval. It contains ~6000 tweets with labels indicating whether each contains the use of sarcasm. Preprocessing will involve removing overt markers of sarcasm (e.g. /s, #sarcastic), and building the model’s vocabulary.

Methodology: What is the architecture of your model? How are you training the model?

The model consists of four components. First, input text passes through the decomposer, which locates and separates sentiment words from the rest of the sample. These sentiment words are sent to the literal channel, while the remaining text is processed by the implied channel. The literal and implied channels are identical in architecture, both featuring an encoder, a linear layer, and a softmax to identify whether the given text is positive or negative. Finally, the analyzer applies linear layers to the outputs of both channels along with an encoded version of the original input to evaluate whether the text is sarcastic or not.

Metrics: What constitutes “success?” What experiments do you plan to run? For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. If you are doing something new, explain how you will assess your model’s performance. What are your base, target, and stretch goals?

Accuracy is easily measurable based on success at identifying whether a tweet contains sarcasm. The paper’s authors compared their model’s accuracy on various datasets to that of other models, hoping to see improved performance, which they mostly achieved. Our base goal would be to successfully split the words of a tweet based on sentiment and have the model recognize the positivity of each portion. The target would be a successful full implementation of the model. Stretch goals could involve applying the model to other datasets or tasks.

Ethics:

Why is Deep Learning a good approach to this problem?

Deep learning is an effective approach for this problem because it requires making judgments about novel text based on general patterns in past data. There is a relationship between the words used and sarcastic intent that humans learn over time with more and more exposure to sarcasm, and a machine should be able to pick up on similar generalizations.

How are you planning to quantify or measure error or success? What implications does your quantification have?

We measure success just by whether the model correctly identifies a sample as containing sarcasm. This metric is somewhat limited, as a perfect model would only be able to state that sarcasm is present, not where the sarcasm is located or how the meaning of the sentence shifts from the literal interpretation. A more useful tool would provide greater information about how exactly sarcasm is being employed, but that problem is far more complex.

Division of labor: Briefly outline who will be responsible for which part(s) of the project.

One person will work on loading in data, preprocessing, the decomposer, training and test loops and the custom loss function. The other person will work on the rest of the model: the implied and literal channels and the analyzer.

Reflection Link: https://docs.google.com/document/d/1Q4QLrQhvIWZPxH7QQbWwivMjswpqvjWPcTCBLIROegA/edit?usp=sharing

Built With

python
tensorflow

Updates

Mark Miller started this project — Apr 14, 2023 12:11 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.