Inspiration

We've purchased products off Amazon based on their good reviews only to find that they don't perform to the level portrayed by the comments. We wanted to create a solution to inaccurate ratings created by overly positive and overly negative reviewers. 

What it does

Our program gets all the reviews for a product on Amazon and puts the review comment and the star rating into a data frame. The program then performs a sentiment analysis on the data frame and returns a mean based on the wording of the reviews and three graphs the give a deeper look into the data.

How we built it

We used python to build the program in Visual Studio Code. In order to obtain the data, we used web scraping to get the star ratings and review comments from the HTML of an Amazon product page. Once we gather all the information, we imported the nltk package to observe the true sentiment of what was written. 

We cleaned the data by taking out out any punctuation and stop words so the data could better represent the customers. I then used VADER to analyze the remaining words and create a compounded sentiment for each review. I normalized this data to the range 1 and 5 so that it would be more meaningful to interpret.

To visualize the data I first created a histogram to visualize to compare our sentiment scores with the actual rating scores as well as their averages. Then I created a violin plot to see the distribution of reviews in each plot. Finally, I plotted  a frequency distribution of the top 30 words to see how often they occurred in the reviews.

Challenges we ran into

A challenge we had with web scraping was the anti-robot features that stopped us from gathering info from Amazon. In order to avoid this, we had to use new headers that would allow us to obtain the data. Also, navigating the HTML of Amazon after obtaining it through web scraping was difficult as we had to learn the structure of Amazon's HTML. 

Another challenge we faced was learning how to use the NLTK and VADER packages to analyze the results. The reviews we received was full of whitespaces and unaligned text since they came straight from the HTML. Fixing this took multiple rounds of testing, but eventually we were able to get clean data. 

Accomplishments that we're proud of

We're proud of getting web scraping to work even though we had never attempted it before. Getting to use industry-standard tools in Python for data analysis was a really interesting process as it gave us exposure to problems that wouldn't be talked about in a textbook. Overall, the challenges we faced were due to our inexperience, but looking back we know that this was a great learning experience for us and the skills we learned will help us in our futures.

What we learned

We learned how to successful web scrape from a large website like Amazon and tools needed to do proper sentiment analysis of customer reviews.

What's next for Amazon Review Judger

We hope to improve the algorithms in the Amazon Review Judger. When web scraping from products that have a large amount of reviews, it takes extremely long due to the inefficiency of our algorithms as we go through every review twice for the web scraping alone. We hope we can reduce this to only one visit to each review.

Built With

Share this project:

Updates