Deep Yelping

Final Write Up

Title: Deep Yelping

Contributors:

Sara Syed, Abby Dichter, Cameron Wenzel, Micah Bruning

Introduction:

We are implementing a new idea using common sentiment analysis methodology. Our goal is to create a model that predicts whether a Yelp rating of a particular business is positive or negative given a corpus of text for each review. We will sample thousands of reviews for hundreds of restaurants. We aim to investigate the accuracy in predicting positive or negative ratings based on the sentiment of the reviews. In other words, we aim to first predict sentiments expressed through word patterns and evaluate whether certain sentiments can correctly predict the numerical rating associated with that review. This is a binary Classification problem as we are trying to classify restaurant reviews as positive or negative based on some overall sentiment from the text. Methodology: We implemented a Recurrent Neural Network in making our binary classifier.

In formatting our data, we read through a JSON file of yelp reviews and cleaned each review by removing punctuation and “stop words” -- neutral and repetitive words such as “the” and “for.” We then tokenized each review by every word and converted each word to some corresponding integer.

In the main model architecture, we utilize three linear layers, an LSTM layer and an embedding matrix. For the linear layers we specified ReLu activations after the first two layers and a Softmax activation after the last layer. The embedding matrix was created by using a pre-made text corpus of word embeddings. We converted every rating that is 3 or greater to be considered “positive,” represented by a 1, and all ratings under 3 to be “negative,” represented by a zero. We train our model in the typical fashion and utilize a Sparse Categorical Cross Entropy loss function for the model’s backpropagation.

Results:

After modifying our hyperparameters to be optimal, we were able to obtain an accuracy of 71.66%. This means that we were able to correctly predict whether a review was positive or negative nearly ¾ of the time. We regarded this as quite successful -- such accuracy gives us a good degree of confidence that our model is properly identifying words which are commonly associated with positive reviews.

Challenges:

A lot of the challenges we faced in our implementation revolved around hyperparameter optimization and effectively representing our data. The dataset we use is extremely large so when we began preprocessing the data we had to decide how much of the dataset we could reasonably load into our model without sacrificing sufficient training examples. Like most model implementations, we took a trial-and-error approach until we found a number of datapoints for our model that our computers could efficiently process. Our initial plan was to utilize a tree structure discussed in class to represent the text of the review and run a Recursive Neural Network on it. However, there were not any good libraries or packages to help deal with running Recursive Neural Nets on trees so we decided to keep the reviews in a more standard list format. In processing each review, we had to constantly tinker with the right number of words to select from the review and we found that 100 words gave us the best result.
Our model architecture was extremely finicky at first when it came to adjusting hyperparameters. Simply changing the number of examples by 100 or slightly modifying the hidden sizes of one of our linear layers had a drastic effect on the resulting accuracy. On the flip side, some improvements to our model barely had a positive impact on the results. The decision to remove “stop words,” for example, led to a minor increase in accuracy.
Similar to previous homework assignments, we ran into more syntax based problems that didn’t seem to make the most sense. For example, we utilized a lot of numpy functions in casting data types and formatting arrays. However, we were getting unusually low accuracies until we decided to rely primarily on tensorflow functions even for more trivial operations.

Reflection:

Overall we feel satisfied with how our project turned out. Although we weren’t able to implement a tree structure or create a multi-level classifier we believe the accuracy benchmark we reached given the uniqueness of the data is something to be proud of. The model somewhat worked in the way we expected it to. It accurately predicts whether or not a review is positive or negative, but it’s incredibly sensitive to adjustments to the data which we were not expecting. As mentioned in the challenges section, we had to make a few adjustments to our plans when it came to structuring the data. We had to pivot away from the tree-based structure to a more plain representation of the text. If we could do the project over again we would start earlier and find a way to implement a Recursive Neural Network. While we were proactive about our project we did not realize the sheer time that a Recursive Neural Network would take because of the lack of python libraries available to run an RvNN on a tree. Our biggest takeaways from the project was the difficulty of formatting raw data and the sensitivity of some model architectures. Moreover, even though our project wasn’t incredibly complex, we still felt a great deal of confidence in fully implementing our own project from scratch.

Introduction:

We are implementing a new idea using common sentiment analysis methodology. Our goal is to create a model that predicts each person’s numerical (1-5) rating of a particular restaurant on Yelp given a corpus of text for each review. We will sample thousands of reviews for hundreds of restaurants. We aim to investigate the consistency in predicting numerical ratings based on qualitative reviews. In other words, we aim to first predict sentiments expressed through word patterns and second evaluate whether certain sentiments can correctly predict the numerical rating associated with that review. This is a five class Classification problem as we are trying to classify restaurants ratings based on some overall sentiment from the text.

Challenges:

So far, preprocessing has been challenging, in that we have not previously tagged words in any way. We are currently looking into seeing if we can tag different parts of speech to make our model more similar to those which we had researched in our introduction to sentiment tracking. However, this is proving quite difficult. We are thinking about simply doing sentiment tracking on all the words in our Yelp! reviews, however we are concerned that this could cause a lot of noise in the data. This has been our biggest challenge to date and is the reason why preprocessing is taking us longer than we would have normally expected in a typical DL-project!

Insights:

At this moment, since we are still working through preprocessing, it is difficult to say what concrete results would be currently applicable. However, as was previously explained in the “Challenges” section of this document, we would say that our data may have a bit more noise than was previously expected in other sentiment-analysis situations, and so we may have to reduce our goals from the high goals we set in our initial proposal. This change will allow us to have more attainable results, since we did not foresee these preprocessing difficulties which we are currently working through!

Plan:

We need to dedicate more time towards successfully preprocessing the data for our sentiment analysis. We are thinking of using fewer categories of tagged words to make the sentiment analysis run smoother. We also need to learn more about the syntax of transforming our preprocessed data into a tree structure. We would like to do a bit more research on the type of tree structure that would be most conducive to efficient sentiment analysis. We are still considering how we want to measure success, and different ways to incorporate if our prediction was close, not just correct.