Abstract-Based Sentiment Analysis using Gated Convolutional Neural Network on Challenge Dataset

Team Members: Eric Tang (etang14) Shreyas Mishra (smishr22) Sanyu Rajakumar (srajakum) Alex Bao (aboa5)

Introduction: The paper we’re re-implementing (Xue and Li, 2018) offers a novel methodology for Aspect-Based Sentiment Analysis (ABSA), which differs from typical sentiment analysis in its granularity. To be precise, ABSA is composed of two subtasks: Aspect-Term Sentiment Analysis (ATSA), which aims to classify the sentiment polarity with regard to labeled target entities in the text (usually words or multi-word phrases), and Aspect-Category Sentiment Analysis (ACSA), where sentiment polarities with respect to predefined categories (which don’t necessarily appear in the text) are identified.

This paper is particularly appealing since it provides a robust alternative to the LSTM and Attention frameworks that have been commonly used to tackle this problem – LSTM-based models suffer from long compute times since they process data sequentially, and both LSTM and Attention models don’t have enough flexibility to capture the aspect and sentiment information simultaneously. Additionally, both can end up relying on a lot of parameters and become computationally intensive. Gated Convolutional networks with Aspect Embedding (GCAE), however, require far fewer computational resources, are easily parallelizable, and perform comparably to other models.

To us, this paper is alluring in its seemingly unintuitive reworking of ideas presented in class – in particular, Attention seemed to be the gold standard for most NLP tasks, but by cleverly integrating aspects of an “older” approach and applying it to a totally different task than what it’s commonly used for, one produces a more effective model. Furthermore, this paper frames “improved performance” not only in terms of its (meager) increase in accuracy but also in its relative simplicity and reduced training time. Since this is a task that is growing in relevance to private industry, meaning deep learning models for ABSA will be used more and more, creating solutions that increase efficiency could produce substantial benefits.

Related Work: We were particularly interested in Attention-based LSTM networks for sentiment analysis. (Yequan Wang et al., 2016) proposed Attention-based networks where the model learns aspect embeddings which are then used to learn Attention weights. These Attention weights are able to focus on different aspects of a sentence, which when combined with an LSTM neural network performs sentiment analysis. However, as we saw in class, LSTMs cannot be parallelized and require intensive computing resources. To counter this, we explored other models/architectures which led us to the current paper we are implementing.

Data: We are using the Multi-Aspect Multi-Sentiment (MAMS) dataset sourced from (Qingnan Jiang et al., 2019) which contains restaurant reviews with at least 2 sentiments of opposite polarity. The dataset is based on food reviews from the Citysearch New York dataset. The MAMS dataset has 2 versions depending on the following tasks: aspect-term sentiment analysis (ATSA) and aspect-category sentiment analysis(ACSA). In ATSA, we are given the aspect terms and their respective polarities, along with the start and end index for each aspect. Sentences with only one aspect or sentences with multiple aspects of similar polarity are deleted. In the case of ACSA, there are 8 aspect categories (such as food, service, etc.) labeled by their sentiment polarity. Compared to the SemEval-2014 Restaurant Review dataset which is used in the paper, we have 3 more aspect categories for ACSA (8 in MAMS vs 5 in SemEval). According to the Dataset analysis section of (Qingnan Jiang et al., 2019), “MAMS consists of 13,854 instances for ATSA and 8,879 instances for ACSA, which is 2.87 and 1.87 times of SemEval-2014 Restaurant Review dataset respectively.”

We do not expect significant pre-processing as the MAMS dataset has already been tailored to be trained by a deep learning model.

E.g.
“The ambiance was terrible, but the food was amazing”. “We loved the Thai food, service was slow’

Methodology:

  • The feed-forward architecture is as follows:
  • The embedding layer is initialized with pre-trained embeddings (GloVe)
  • The sentence is represented by a matrix of embeddings
  • Sentences are passed into two separate convolutional layers
    • Each kernel corresponds to a linguistic feature detector that extracts a specific pattern of n-gram at various granularities
  • One convolutional layer goes through tanh activation, the other through ReLU
    • ReLU gate receives additional aspect information (embedding of given aspect category for ACSA or output of CNN over aspect term embeddings for ATSA)
    • Tanh gate is intended to generate sentiment features, ReLU generates aspect features
  • Outputs of ReLU and tanh gates are pointwise multiplied
  • For each convolutional filter, the max-over-time pooling layer takes the maximal value among the generated convolutional features, resulting in a fixed-size vector whose size is equal to the number of filters
  • Fully connected layer with softmax uses the resultant vector to predict the sentiment polarity of the input sentence

The predictions fall in a set of three sentiment polarities: positive, negative, and neutral. Each aspect term/category, of which there are multiple in each sentence, is labeled with one of these sentiments. The model will be trained by minimizing the cross-entropy loss with respect to the labels.

The most difficult part of implementing this model may end up being hyperparameter tuning – convolution itself entails several hyperparameters, so having three of these, along with embeddings, max-pooling, and fully connected layers in the gating unit and for prediction create many possible architectures.

Metrics: The original paper that we are creating our model based on used the SemEval dataset. However, as discussed in the paper for the MAMS dataset, a lot of the restaurant reviews tend to have only one sentiment or words that are polarized to one sentiment. In the MAMS dataset, there will be a wider range of sentiments within a given sentence. We will run experiments to ensure that the predictions and accuracy of our model can remain high when testing on both the MAMS and SemEval datasets. Additionally, a major upside for this model architecture is the usage of convolution and gating instead of LSTMs which should be faster than existing models that tend to rely on LSTMs. To test the efficacy of the training, we will conduct experiments to compare the training speed for our model versus models with different architectures by training them both using the MAMS dataset and then comparing the times elapsed as well as the accuracies.

The notion of accuracy also applies when looking at our project as we will want to be able to correctly predict the sentiments of a given sentence. Additionally, speed is another metric that applies for the reasons outlined above.

The authors were trying to create a model that trained much faster than existing LSTM models for ACSA and ATSA tasks, while also maintaining comparable accuracy. The authors compared their model to 6 existing models on 4 different datasets from the SemEval workshops. They conducted 5 trials for each of those datasets and found the mean and standard deviation for accuracies and training time. In short, their model performed the best for 3 out of the 4 datasets and was close in the last one. Additionally, the GCAE model converged by far the fastest of the models.

Base goal: we hope to create a working model on the MAMS dataset with > 70% accuracy on the easy restaurant test set from SemEval. Target: within 5% accuracy of the paper’s accuracy on all SemEval datasets and on the MAMS dataset Stretch: improved performance on SemEval (as our model trains on a different and potentially more informative dataset) and near-state-of-the-art performance on MAMS

Ethics: The field of sentiment analysis goes beyond just our datasets which are based on restaurant reviews. Overall, applications in society can involve assessing consumer sentiment in different industries or even weeding out political sentiments. The latter is a clear point of concern as governments could use this technology to identify political dissidents. This could lead to massive online censorship either through bots looking for and deleting posts that exhibit negative sentiments towards the current regime or through bots flagging users and exposing them to punishment under the law. As a result, it is important to consider the stakeholders of the technology we contribute to and the datasets used to train our model.

First and foremost, the societal effects and underlying issues related to sentiment analysis and, by extension, our project include nearly everyone as a possible stakeholder. Anyone with an active online presence who expresses opinions could be analyzed by a sentiment analysis deep learning model, so the effects of a highly accurate model would be particularly far-reaching. These stakeholders could stand to gain or lose, depending on who is obtaining the data and how the data is used. Aside from stakeholders being used as data points, other stakeholders, especially those who deploy the model, could stand to gain enormous economic, political, or social power. For instance, through sentiment analysis, a company can target their marketing or tailor their product more effectively and gain market share. This makes it all the more important that there is some system of checks and balances in place on how data is used or not used.

In regards to our training data, for this project, we are using both the SemEval and MAMS datasets. As we pointed out above, most of the restaurant reviews included tend to express or are polarized to one sentiment. The MAMS dataset will hopefully challenge our model to account for more nuanced reviews expressing multiple sentiments. However, it should be noted that sentiment analysis does not perform well in different domains, so our model will likely not be useful for tasks outside of classifying the sentiments of restaurant or laptop reviews (depending on the dataset selected for training). Additionally, sentiment analysis is also known to have a difficult time identifying sarcasm, so the model may not always be reliably accurate. As such, it is important to consider whether a sentiment analysis model like the one we are attempting to replicate is suitable for the task it is used for before deploying it and regarding its results with a degree of skepticism.

Division of Labor: Data collection + preprocessing: Sanyu Model architecture: Shreyas Accuracy experiment, Train + Test: Alex Speed experiment: Eric

Built With

Share this project: