Title
Hotel Ratings Predictor
Team Members
Ahmed (Chotoo) Amin (aamin4), Justen Joffe (jjoffe1), Mahir Arora (marora6)
Introduction
We are implementing an rating’s predictor paper with a Hotel reviews dataset. We are seeking to potentially make headway on the topic of standardized reviews. What may be a positive review to some could be perceived as a negative review to others; NLP and text analysis can help bridge these discrepancies. This is a classification problem at heart – whether a review is positive (3 stars and above) or negative (below 3 stars).
Related Work
After much discussion with our mentor TA, Logan, we realize that the lack of literature regarding Hotel review ratings prediction qualifies this as a new idea based on an open-source project. We looked at this open source project about Amazon's ratings (https://github.com/imdeepmind/RatePrediction) and this DL Spring 2022 project. We lifted code from both the Amazon ratings and the DL Spring 2022 project; the DL Spring 2022 project was completed in We also looked at this white paper that used a similar methodology as our paper. However, this white paper was centered around a regression task that sought to predict out of 5 a hotel’s ratings. We want to be more general and thus are seeking to predict whether a review is positive or negative.
Data
We are using a dataset containing ratings and reviews for 1,000 hotels across the United States. Our dataset contains information such as state, address, and most importantly, a rating out of 5 and the text contained in the respective review. The latter two data points will allow us to create a model that predicts whether a rating is positive or negative for a hotel based on the review.
Our dataset contains just under 36,000 entries for 1,000 hotels. We plan on preprocessing the data using the pandas library. This will entail splitting the data into a training set and a testing set. In order to reduce location bias, we will incorporate a random shuffle of the reviews as a part of preprocessing. Furthermore, since we are focused on determining whether extreme reviews are positive or negative, we will eliminate ratings that are in the middle (3 out of 5) in preprocessing as well.
This is the link to our dataset: yelp hotel data
Methodology
The architecture of our model based off of the model in the paper which uses convolution layers before using the LSTM layer. We might change how we attempt to do this though if we are able to find a method that could improve the architecture. We might try to follow something similar to the hw4 we did where we use an embedding layer then LSTM layer before putting that into two dense layers with leaky relu and softmax.
We want to train it by using the data and label and using our call function on it. We will then take the loss across our batches and then use our optimiser to do updates on the trainable params
We anticipate that the most challenging part of our implementation will be adjusting the hyperparameters in our model to account for the differences between our model’s training and the paper’s implementation. Adjusting hyperparameters can be a tedious process especially since each adjustment requires a retraining of the model, but as long as we account for this and acknowledge that it may take up a lot of time, we should be able to optimize performance.
Metrics
We plan to toy with adding more layers and potentially even moving away from a LSTM and towards a GRU, if time permits.
Accuracy is appropriated for our project.
The author was trying to accurately predict Amazon ratings. The author’s e model was only 50% accurate, and they were hoping increase the accuracy to 75%.
Base Goal: 50% Target Goal: 60% Stretch Goal: 75%
Ethics
Unfair reviews and ratings can tank a small business, such as a boutique hotel. Thus, we hope to explore standardizing a rating system across guests.
DL is a good approach to this problem because we are dealing with natural language processing and text analysis. Hence, LSTMs and GRUs in a DL framework are appropriate.
Division of Labor
Chotoo: Pre-processing and Poster Justen: Model and Poster Mahir: Model and Poster
Log in or sign up for Devpost to join the conversation.