Review Quality Detection System – Devpost Submission
Project Overview
Our Review Quality Detection System is an AI-powered solution that assesses the quality and relevancy of location-based reviews. Unlike traditional approaches that conflate user satisfaction ratings with review quality, our system provides a sophisticated, rating-independent assessment of review content quality.
Problem Statement & Solution
The Problem Location-based review platforms face a fundamental challenge: distinguishing between review quality and user satisfaction. A 5-star review can contain spam, advertisements, or irrelevant content, while a 1-star review can be well-written, informative, and constructive. Traditional systems often use rating data to determine quality, which is fundamentally flawed.
Our Solution We built a machine-learning system that assesses review quality based purely on text characteristics and policy compliance—completely independent of user ratings. The system achieves 98.6% accuracy using an ensemble approach, demonstrating that review quality and restaurant rating are independent concepts.
Key Features & Functionality
- Text Quality Analysis: length, readability, vocabulary diversity, grammar assessment
- Policy Compliance: detection of advertisements, spam, irrelevant content, excessive rants
- Content Relevance: focus on restaurant experience and dining-related topics
- Writing Sophistication: grammar, formatting, and style analysis
Advanced Policy Enforcement
- Advertisement Detection: phrases like "buy now", "special offer", contact information, competitor promotion
- Spam Detection: phone numbers, emails, URLs, suspicious patterns
- Irrelevant Content: politics, sports, weather, entertainment topics
- Rant Detection: excessive complaints, repetitive negative language
- Quality Standards: minimum length, formatting requirements, vocabulary standards
Real-World Performance
- 1,100 authentic restaurant reviews tested from Google Maps
- 833 reviews approved (75.7%) — no policy violations
- 232 reviews approved with warning (21.1%) — minor violations
- 34 reviews under review (3.1%) — medium-severity violations
- 1 review rejected (0.1%) — critical violations
Development Tools Used
- VS Code – primary Python development environment
- Jupyter Notebooks – data exploration and experimentation
- Git – version control and collaboration
- Terminal/CLI – script execution and environment management
APIs Used
- Google Maps API – dataset collection and location-based review data
- NLTK API – natural language processing and text analysis
- TextBlob API – sentiment analysis and text processing
- Scikit-learn API – machine-learning algorithms and training
Libraries & Frameworks
- Hugging Face Transformers (for future enhancements)
- PyTorch (for deep-learning experimentation)
- Scikit-learn (Random Forest, XGBoost)
- Pandas, NumPy
- NLTK, TextBlob
- XGBoost
- Matplotlib, Seaborn, Plotly
- Imbalanced-learn
Assets & Datasets
- Google Local Reviews Dataset: 1,100 authentic restaurant reviews
- Manually Labeled Data: quality assessment labels for training/validation
- Image Dataset: 1,103 review images (taste, menu, atmosphere)
- Policy Violation Annotations: curated examples
- Quality Assessment Ground Truth: expert-validated scores
Technical Architecture
Data Preprocessing Pipeline
- Text cleaning and normalization
- NLP processing (stopword removal, lemmatization)
- Feature extraction (TF-IDF, count vectors, topic modeling)
- Policy-violation detection
Machine-Learning Models
- Random Forest: baseline, interpretable
- XGBoost: gradient boosting, strong performance
- Ensemble Model: voting classifier combining the above
Feature Engineering
- Textual features: TF-IDF, count vectors, topic modeling
- Text quality: length, word count, readability scores
- Policy violations: spam/ads/irrelevance signals
- Writing sophistication: vocabulary diversity, grammar indicators
Model Performance Results
| Model | Accuracy | Precision | Recall | F1 Score | ROC AUC |
|---|---|---|---|---|---|
| Ensemble | 0.986 | 0.986 | 0.986 | 0.986 | 0.999 |
| XGBoost | 0.973 | 0.973 | 0.973 | 0.973 | 0.996 |
| Random Forest | 0.886 | 0.889 | 0.886 | 0.885 | 0.958 |
Real-World Applications
- Content Moderation: automatically filter low-quality reviews
- Platform Integrity: maintain quality standards
- User Experience: ensure relevant, informative content
- Business Insights: focus on genuine customer feedback
Project Relevance
This project addresses the challenge of assessing review quality and relevancy in location-based platforms. By separating quality assessment from rating bias, it delivers a more accurate and fair moderation and quality-control system. Relevant for:
- Review platforms (Google Maps, Yelp, TripAdvisor)
- E-commerce sites with location-based reviews
- Restaurant management systems
- Content-moderation tools
- Quality-assurance systems
Future Enhancements
- Multi-language support for global deployment
- Real-time processing capabilities
- Advanced NLP models (BERT, GPT integration)
- User-feedback integration for continuous improvement
- API deployment for third-party integration
Project Status: ✅ Production-Ready
Best Model Performance: Ensemble (F1: 0.986, Accuracy: 0.986)
Key Achievement: Proper separation of rating and quality assessment
Team: ElixirHackers

Log in or sign up for Devpost to join the conversation.