Review Quality Detection System – Devpost Submission

Project Overview

Our Review Quality Detection System is an AI-powered solution that assesses the quality and relevancy of location-based reviews. Unlike traditional approaches that conflate user satisfaction ratings with review quality, our system provides a sophisticated, rating-independent assessment of review content quality.

Problem Statement & Solution

The Problem Location-based review platforms face a fundamental challenge: distinguishing between review quality and user satisfaction. A 5-star review can contain spam, advertisements, or irrelevant content, while a 1-star review can be well-written, informative, and constructive. Traditional systems often use rating data to determine quality, which is fundamentally flawed.

Our Solution We built a machine-learning system that assesses review quality based purely on text characteristics and policy compliance—completely independent of user ratings. The system achieves 98.6% accuracy using an ensemble approach, demonstrating that review quality and restaurant rating are independent concepts.


Key Features & Functionality

  • Text Quality Analysis: length, readability, vocabulary diversity, grammar assessment
  • Policy Compliance: detection of advertisements, spam, irrelevant content, excessive rants
  • Content Relevance: focus on restaurant experience and dining-related topics
  • Writing Sophistication: grammar, formatting, and style analysis

Advanced Policy Enforcement

  • Advertisement Detection: phrases like "buy now", "special offer", contact information, competitor promotion
  • Spam Detection: phone numbers, emails, URLs, suspicious patterns
  • Irrelevant Content: politics, sports, weather, entertainment topics
  • Rant Detection: excessive complaints, repetitive negative language
  • Quality Standards: minimum length, formatting requirements, vocabulary standards

Real-World Performance

  • 1,100 authentic restaurant reviews tested from Google Maps
  • 833 reviews approved (75.7%) — no policy violations
  • 232 reviews approved with warning (21.1%) — minor violations
  • 34 reviews under review (3.1%) — medium-severity violations
  • 1 review rejected (0.1%) — critical violations

Development Tools Used

  • VS Code – primary Python development environment
  • Jupyter Notebooks – data exploration and experimentation
  • Git – version control and collaboration
  • Terminal/CLI – script execution and environment management

APIs Used

  • Google Maps API – dataset collection and location-based review data
  • NLTK API – natural language processing and text analysis
  • TextBlob API – sentiment analysis and text processing
  • Scikit-learn API – machine-learning algorithms and training

Libraries & Frameworks

  • Hugging Face Transformers (for future enhancements)
  • PyTorch (for deep-learning experimentation)
  • Scikit-learn (Random Forest, XGBoost)
  • Pandas, NumPy
  • NLTK, TextBlob
  • XGBoost
  • Matplotlib, Seaborn, Plotly
  • Imbalanced-learn

Assets & Datasets

  • Google Local Reviews Dataset: 1,100 authentic restaurant reviews
  • Manually Labeled Data: quality assessment labels for training/validation
  • Image Dataset: 1,103 review images (taste, menu, atmosphere)
  • Policy Violation Annotations: curated examples
  • Quality Assessment Ground Truth: expert-validated scores

Technical Architecture

Data Preprocessing Pipeline

  • Text cleaning and normalization
  • NLP processing (stopword removal, lemmatization)
  • Feature extraction (TF-IDF, count vectors, topic modeling)
  • Policy-violation detection

Machine-Learning Models

  • Random Forest: baseline, interpretable
  • XGBoost: gradient boosting, strong performance
  • Ensemble Model: voting classifier combining the above

Feature Engineering

  • Textual features: TF-IDF, count vectors, topic modeling
  • Text quality: length, word count, readability scores
  • Policy violations: spam/ads/irrelevance signals
  • Writing sophistication: vocabulary diversity, grammar indicators

Model Performance Results

Model Accuracy Precision Recall F1 Score ROC AUC
Ensemble 0.986 0.986 0.986 0.986 0.999
XGBoost 0.973 0.973 0.973 0.973 0.996
Random Forest 0.886 0.889 0.886 0.885 0.958

Real-World Applications

  • Content Moderation: automatically filter low-quality reviews
  • Platform Integrity: maintain quality standards
  • User Experience: ensure relevant, informative content
  • Business Insights: focus on genuine customer feedback

Project Relevance

This project addresses the challenge of assessing review quality and relevancy in location-based platforms. By separating quality assessment from rating bias, it delivers a more accurate and fair moderation and quality-control system. Relevant for:

  • Review platforms (Google Maps, Yelp, TripAdvisor)
  • E-commerce sites with location-based reviews
  • Restaurant management systems
  • Content-moderation tools
  • Quality-assurance systems

Future Enhancements

  • Multi-language support for global deployment
  • Real-time processing capabilities
  • Advanced NLP models (BERT, GPT integration)
  • User-feedback integration for continuous improvement
  • API deployment for third-party integration

Project Status:Production-Ready
Best Model Performance: Ensemble (F1: 0.986, Accuracy: 0.986)
Key Achievement: Proper separation of rating and quality assessment
Team: ElixirHackers

Built With

Share this project:

Updates