Fake Reviews Classification Outline

Members:

Peter Van Katwyk – pvankatw
Alice Marbach - amarbach
Ricky Zhong - rzhong99

Introduction: What problem are you trying to solve and why?

If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper.

https://doi.org/10.1016/j.jretconser.2021.102771
We are implementing a paper that seeks to identify fake reviews of online products. We chose this paper because we have all had a similar experience: you buy a well-reviewed Amazon product, only to find that the quality of the product is not as promised. This is a common issue for online shoppers, where product review sections are plagued with fake, typically overly-positive reviews.

What kind of problem is this? Classification? Regression? Structured prediction? Reinforcement Learning? Unsupervised Learning? Etc.

The task we will be implementing is a classification model – separating fake reviews, created by deep learning models, and real reviews, written by real people.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching.

Fake review detection has been an important task for companies to limit the purchase of fake reviews. One such example of using deep learning for identification is an article titled “Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining” (Hajeck, 2020). In this study, the authors use traditional methods to identify fakes, such as n-gram and word embedding approaches but add a model to identify emotion in the review. They employ a Skip-Gram Word2Vec model to produce word embeddings from a corpus of consumer review and combine the produced word embeddings with bag-of-words and several lexicon-based emotion indicators. Their model outperformed existing baseline approaches and state-of-the-art fake review detection methods in terms of accuracy, AUC and F-score. (https://doi.org/10.1007/s00521-020-04757-2)

In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”--if you stumble across a new implementation later down the line, add it to this list.

Data: https://osf.io/tyue9/
Repo: https://github.com/joolsa/FakeReviews (PyTorch)

Data: What data are you using (if any)?

If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it).

The dataset we will use can be found here: https://osf.io/tyue9/. The data is semi-synthetic, meaning that half of the observations are real reviews left on Amazon products (Amazon Review Dataset), and the other half was generated by tuning DeepMind’s GPT-2 model and creating new samples. The half that was generated is meant to be the fake reviews that are being sold on the internet. The task is to identify the real data from the fake data.

How big is it? Will you need to do significant preprocessing?

The dataset is 15MB and contains 20,000 real reviews and 20,000 fake reviews. The columns include “category” which indicates what kind of item is being reviewed, “rating” which is how many stars, “label” which indicates whether it is OR (original) or CG (computer generated), and the “text_” column for the review. Other than the processing required normally with NLP, there should not be significant preprocessing required.

Methodology: What is the architecture of your model?

How are you training the model?

We will be training our model using Tensorflow and the dataset described above. We will train specifically using a modified OpenAI model described by the paper.

If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here.

The hardest part of implementing the model will be adjusting our weights over and over again to achieve our accuracy goal.

Metrics: What constitutes “success?”

What experiments do you plan to run?

We plan to run the model with multiple weights and multiple batches of data

For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate?

We want accuracy that is better than chance for sure, or at least 50%. We should ideally be able to reach much higher numbers, since the paper is able to.

If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model.

They were hoping to figure out whether machines could figure out fake reviews better than people could and whether that could be used to make the process more efficient.

What are your base, target, and stretch goals?
Our base goal is to get better than 50% accuracy, our target goal is 70% accuracy, and our stretch goal is close to or above 95% accuracy.

Ethics

What broader societal issues are relevant to your chosen problem space?

Fake reviews, and text that appears to have been written by humans, are a growing and serious problem in today’s world. In the limited space of fake reviews, consumers may be convinced to buy or avoid products in order to further the profits of one of the firms in the market. More broadly, anyone who can leverage the power of mass-generating text that mimics humans has the ability to sway public opinion and discourse. This problem, deeply tied to fake news, can change the political and social landscape of our world.

Why is Deep Learning a good approach to this problem?

The issue of fake reviews is best solved by Deep Learning for many reasons. Firstly, with the growing accuracy of models like GPT2, machine-generated text can more and more easily fool humans: a person browsing Amazon today could have great difficulty identifying fake reviews. Secondly, scale is an important factor. Amazon identifies millions of fake reviews each month: humans cannot possibly keep up with the number being created. Thirdly, because fake reviews are so similar to human-generated text, the task of identifying fake reviews is complex, requiring the consideration of many thousands of parameters.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?

Our dataset is a dataset made from GPT-2 based on the paper. There are no concerns about how it was labeled. It is representative of the problem we are trying to model, with 20,000 real and 20,000 fake reviews. The biases that it may contain include the biases included in reviews that are then perpetuated into the GPT-2 model that created the fakes.

Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?

The major stakeholders are both users trying to detect fake reviews and companies trying to remove them as well.

How are you planning to quantify or measure error or success? What implications does your quantification have?

Our success will be based on the proportion of reviews that we can correctly identify as real or fake. This ability could be important to eliminate the ability to purchase fake reviews on Amazon or other platforms.

Add your own: if there is an issue about your algorithm you would like to discuss or explain further, feel free to do so.