Truer Ratings: Normalizing Google Reviews Data

Histogram of Category Averages Before we Normalized Individual User Averages
Histogram After Normalizing
Linear Model Created from Normalizing Data

Inspiration

We noticed that most Google reviews have a polarized distribution, and we were seeking a way to correct this bias to give more accurate ratings.

What it does

We created an algorithm that accounts for individual user biases in reviewing, and from there, we made a linear model that corrects users' distribution of 5-star ratings to ensure proper use of the rating system.

How we built it

We mostly used Python and Python libraries to create the models.

Challenges we ran into

Importing was a struggle. Some of the columns had hidden strings, which made typecasting to floats somewhat tricky. This project also required a lot of mathematical processing, which were new challenges to overcome.

Accomplishments that we're proud of

We gained much experience with Python libraries and used advanced mathematical techniques to find our constant (k).

k = 0.3873730897663091

What we learned

We found a consistent value by which users appeared to underrate things across all categories of locations, given a certain number of data points. We used this k constant to correct ratings and create a more accurate rating system.

What's Next for Truer Ratings: Normalizing Google Reviews Data

We believe that Google could incorporate our research to account for individual biases in reviewing. We posit that if each person's ratings were scaled in reference to one another rather than just only on the five-star scale, we would have better and more accurate ratings.