🌍 Google Location Review Detector

📝 Introduction

Maintaining the quality and trustworthiness of location-based reviews is crucial for both businesses and consumers. Fake or misleading reviews can distort perceptions, harm businesses, and erode consumer confidence.

The Google Location Review Detector is an automated system designed to efficiently identify and flag suspicious reviews that violate common platform policies, such as:

🚫 Spam or advertisements
❌ Irrelevant or off-topic content
😡 Rants without evidence of a real visit

⚙️ What It Does

Our solution takes a CSV file of Google location reviews as input and processes each review through a hybrid ML pipeline:

Preprocessing & Feature Engineering
- Clean review text
- Extract numerical features (review length, sentiment polarity, subjectivity, suspicion score, etc.)
Classification Model
- A custom-trained DistilBERT model classifies reviews into:
  - ✅ NONE – No violation
  - 🚫 SPAM – Spam/Advertisement
  - 😡 RANT_WITHOUT_VISIT – Rant without proof of visit
  - ❌ IRRELEVANT_CONTENT – Off-topic/irrelevant content
Output
- Two separate CSV files:
  - Clean reviews
  - Flagged violations
- A summary report detailing the findings

🛠️ How I Built It

I focused on a machine learning approach using a custom DistilBERT-based model. The workflow included:

Data Preparation – Cleaning review text and extracting numerical features.
Feature Engineering – Metrics such as sentiment polarity, subjectivity, review length, and Rule based suspicion score.
LLM as a Judge- Used GPT-OSS-20B using Groq Cloud to generate pseudo labels as the original data did not have any labels
Model Development – Training DistilBERT for multi-class classification with both text and numerical features.
Deployment – Building a Gradio-powered web interface to let users upload CSV files and instantly receive analysis results.

🚧 Challenges I Ran Into

🔎 Feature Selection – Choosing the right combination of text + numerical features for accurate classification.
🎯 Model Training – Balancing generalization vs. overfitting with limited review violation data.
⚖️ Class Imbalance – Most Google reviews are clean, so violations were underrepresented. Mitigated with class weights.
💻 Compute Limits – Google Colab GPUs often crashed during long training runs.

🏆 Accomplishments

✅ Developed and deployed a full end-to-end review violation detection system.
🧠 Created a custom ML model tailored specifically for review policy violations.
🌐 Built a user-friendly interface with Gradio for easy adoption.

📚 What I Learned

Combining textual and numerical metadata significantly improves classification robustness.
Practical experience in fine-tuning and deploying DistilBERT for a domain-specific NLP task.
Building interactive ML tools with Gradio for real-world usability.

🚀 What’s Next

Model Enhancement – Explore ensembles or larger LLMs (e.g., Qwen 3, Gemma 3) for deeper review understanding.
Granular Violation Details – Provide explanations, reasons, and confidence scores for flagged reviews.
Real-Time Analysis – Adapt the system to flag reviews as they are submitted.

🧰 Libraries & Frameworks Used

Core Language: Python
ML Framework: PyTorch
NLP / Data Processing: Hugging Face Transformers, pandas, NumPy, scikit-learn
Deployment/UI: Gradio

📂 Assets & Datasets

Dataset: UCSD Google Local Reviews Dataset
Model: Custom-trained DistilBERT classifier for review violation detection

Built With

Updates

Sidharth Vinod started this project — Aug 30, 2025 08:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.