Project: Location Review Quality & Relevancy Assessment
1. Overview
Our solution is an end-to-end ML pipeline that automatically evaluates the quality and relevancy of Google location-based reviews.
It detects and flags spam, advertisements, irrelevant content, and rants from users who likely never visited the location.
The system enforces content policies to ensure only genuine, useful reviews are surfaced.
By filtering out low-quality or irrelevant reviews, our system improves the reliability of location ratings and helps users make better decisions.
It is modular, scalable, and leverages both state-of-the-art LLMs and efficient classical ML for real-world deployment.
2. Problem Statement Tackled
We address the challenge of:
- Gauging review quality: Detecting spam, promotional content, irrelevant reviews, and rants from non-visitors.
- Assessing relevancy: Ensuring reviews are genuinely related to the location.
- Policy enforcement: Automatically flagging reviews that violate platform guidelines (ads, off-topic, non-visit rants).
3. Features & Functionality
- Preprocessing: Cleans and standardizes review text, removes duplicates, and extracts keywords.
- Sentiment Analysis: Uses LLMs to assign sentiment labels and scores.
- Policy Classification: Applies rule-based, zero-shot, few-shot, and fine-tuned ML models to categorize reviews into:
- Advertisement
- Irrelevant
- Rant without visit
- Clean
- Advertisement
- Ensemble Inference: Combines multiple model predictions for robust policy enforcement.
- Fast ML Classifier: Trains a TF-IDF + LinearSVC model on weak labels for scalable, classical ML inference.
- Reporting: Outputs processed CSVs and confusion matrix visualizations for evaluation.
4. Development Tools & Resources
- VSCode
- Jupyter Notebook
- Python
- GitHub
Libraries & Frameworks
- PyTorch (backend for Transformers)
- scikit-learn (TF-IDF, LinearSVC, metrics)
- pandas (data manipulation)
- numpy (numerical operations)
- matplotlib (confusion matrix visualization)
- langdetect (language detection)
- joblib (model persistence)
Assets & Datasets
- Google Local Reviews dataset (primary source)
- Manually labeled data (for ground-truth evaluation and fine-tuning)
- Synthetic examples (for few-shot prompts and rule-based heuristics)
Log in or sign up for Devpost to join the conversation.