Motivations
As consumers, we’ve all faced the dilemma of deciding whether a restaurant is truly "worth the hype" or if a shop is actually worth visiting. Naturally, we turn to online reviews for guidance, but more often than not, the reviews are unhelpful. Some feel overly biased and some just don't give us the information we need. It can be hard to know which opinions to trust.
This inspired us to build a model that uses machine learning and natural language processing to assess the quality and relevancy of Google reviews, thereby highlighting those that are trustworthy and meaningful.
Problem Statement
Design an ML-based system to evaluate the quality and relevancy of Google location reviews. The system should gauge review quality, assess relevancy, and enforce policies by filtering out reviews that violate rules, such as advertisements, unjustified rants, or irrelevant content
Our Solution
We started by cleaning the text data. For Kaggle reviews, this meant lowercasing, removing stopwords/punctuation, and lemmatizing words so the text was standardised. For Google Local Reviews, which are huge JSON files, we wrote code to stream and sample them efficiently instead of loading everything at once.
Next, we used a Large Language Model (LLM) to detect spam, duplicate reviews, irrelevant rants, and advertisements. This classification step helped us filter out noise and keep only genuine reviews related to the business.
After filtering, we added a scoring and ranking system. We considered factors like review length, sentiment polarity, and originality. Reviews that were longer, relevant, and balanced in tone got a higher score, while suspicious or overly short reviews got flagged.
Finally, we used the Google Places API to add business metadata (e.g. category, location). This let us check whether a review actually matched the place it claimed to review, improving relevancy.
In short, our pipeline is, in this order: Preprocess, Classify, Score, Contextualise. This balances technical rigor with practical checks so users see high-quality, trustworthy reviews first.
Datasets
We used the following data to develop our solution: Google Review Data on Kaggle: https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews Google Local Review data: https://mcauleylab.ucsd.edu/public_datasets/gdrive/googlelocal/
Built With
- colab
- google-places
- huggingfacehub
- json
- kagglehub
- matplotlib
- nltk
- numpy
- pandas
- python
- scikit-learn
Log in or sign up for Devpost to join the conversation.