Inspiration

Online reviews have a huge impact on customer decisions, but not all reviews are reliable. Some contain promotional content, others are irrelevant, and sometimes people write reviews without even visiting the business. Manually moderating reviews is slow and inconsistent. We wanted to explore how AI and machine learning could automate this process, making moderation faster, more accurate, and scalable.

What it does

SmartReview automatically classifies reviews as good or bad based on whether they violate key policies:

  1. Promotional content – contains ads, links, or marketing phrases
  2. Irrelevant content – unrelated or unclear reviews
  3. Not visited – reviewer admits they haven’t used the product or visited the business
    For each review, the system outputs:
  4. Flags for policy violations (ispromotional, isirrelevant, notvisited)
  5. A justification explaining why the review is flagged

How we built it

Approach 1: Hybrid LLM + Rule-Based (Techtokers.ipynb)

  • Model: DistilBERT for sentiment analysis + rule-based detection
  • Strategy: Multi-stage classification across three violation types
  • Performance: 91.9% promotional detection, 98.2% irrelevant content, 98.7% not visited

Approach 2: BERT Embeddings + Ensemble (classifier.ipynb)

  • Model: DistilBERT embeddings + Random Forest classifier
  • Strategy: Multi-class classification with 768-dimensional text representations
  • Performance: 95.0% overall accuracy (Clean: F1=0.976, Irrelevant: F1=0.919, Not Visited: F1=0.930, Promotional: F1=0.974)

Development: Google Colab, Jupyter notebooks APIs: Hugging Face Transformers Libraries:

  • NLP: transformers, torch, nltk, spacy
  • ML: scikit-learn, pandas, numpy
  • Viz: matplotlib, seaborn, wordcloud

Datasets:

  • Approach 1: 1,995 balanced Yelp reviews + 675 manually labelled reviews for validation
  • Approach 2: 397-row balanced training dataset with 8 features (description, rating, violation flags, author, company_name, policy_type)

Challenges we ran into

  • Figuring out how to properly load the LLMs, set up tokenization, handle inputs, and connect the outputs to our DataFrame for classification was tricky
  • Handling ambiguous reviews that partially violate policies
  • Managing performance and cost when using LLMs at scale

Accomplishments that we're proud of

  • Building a fully functional system that automatically flags bad reviews with explanations
  • Combined rule-based, LLM, and ML techniques for a robust hybrid solution
  • Enabled multi-label detection so that each review can be flagged for multiple policy violations

What we learned

  • How to integrate LLMs and classical ML for text classification
  • Techniques for multi-label classification and evaluation with confusion matrices and per-class metrics
  • How to balance accuracy, transparency, and efficiency in a real-world NLP workflow

What's next for SmartReview: Automated Moderation System

  • Possibly expanding the system to handle multiple languages
  • Fine-tune the LLM to improve the detection of subtle policy violations
  • Explore active learning, using flagged reviews to improve the classifier over time

Built With

Share this project:

Updates