Project: Location Review Quality & Relevancy Assessment

1. Overview

Our solution is an end-to-end ML pipeline that automatically evaluates the quality and relevancy of Google location-based reviews.
It detects and flags spam, advertisements, irrelevant content, and rants from users who likely never visited the location.
The system enforces content policies to ensure only genuine, useful reviews are surfaced.

By filtering out low-quality or irrelevant reviews, our system improves the reliability of location ratings and helps users make better decisions.
It is modular, scalable, and leverages both state-of-the-art LLMs and efficient classical ML for real-world deployment.

2. Problem Statement Tackled

We address the challenge of:

Gauging review quality: Detecting spam, promotional content, irrelevant reviews, and rants from non-visitors.
Assessing relevancy: Ensuring reviews are genuinely related to the location.
Policy enforcement: Automatically flagging reviews that violate platform guidelines (ads, off-topic, non-visit rants).

3. Features & Functionality

Preprocessing: Cleans and standardizes review text, removes duplicates, and extracts keywords.
Sentiment Analysis: Uses LLMs to assign sentiment labels and scores.
Policy Classification: Applies rule-based, zero-shot, few-shot, and fine-tuned ML models to categorize reviews into:
- Advertisement
- Irrelevant
- Rant without visit
- Clean
Ensemble Inference: Combines multiple model predictions for robust policy enforcement.
Fast ML Classifier: Trains a TF-IDF + LinearSVC model on weak labels for scalable, classical ML inference.
Reporting: Outputs processed CSVs and confusion matrix visualizations for evaluation.