Trustworthy Location Reviews

Inspiration

We could identify a real-world challenge faced by leading tech companies like \textbf{TikTok}---ensuring that online reviews remain trustworthy and relevant. The problem's broad applicability, from e-commerce platforms to video-sharing apps, made it especially compelling. Moreover, it allowed us to apply cutting-edge \textbf{machine learning} techniques in a meaningful way.


What it does

Our solution tackles four key problems in review moderation:

  1. Location Tagging – Automatically labels reviews with location categories such as restaurant, office, home, park, hotel, or shop using rating category data. For instance, labels review location as "restaurant" with a list of words: taste, menu, indoor, outdoor, and atmosphere.
  2. Content Filtering – Labels reviews for spam (0/1), advertisement (0/1), and irrelevancy (0/1).
  3. Policy Enforcement – Applies rules defined in the problem statement to filter out spamming, advertising, and ranting.
  4. Review Quality Prediction – Uses ML to predict spam, advertisement, irrelevancy, and rant labels to assess overall review quality.

How we built it

  • Location Tagging: Implemented few-shot prompting with the flan-t5-small model for classification from sparse category words.
  • Irrelevancy Detection: Used semantic similarity between location embeddings and review text embeddings via the all-MiniLM-L6-v2 model.
  • Spam/Advertisement/Rant Detection: Developed keyword-based text search patterns to flag low-quality content.
  • Integration: Combined these approaches into an ensemble pipeline that outputs structured quality labels for every review.

Challenges we ran into

  • Sparse Context for Location Classification
    Extracting meaningful location labels from limited category words was difficult on smaller open-source models like flan-t5-small.

    • Example: Identifying “restaurant” from words like taste, menu, ambience was unreliable.
    • Larger models like GPT-4.5 performed significantly better, showing the impact of model size and training data.
  • Data Imbalance
    Our keyword-based tagging for spam, advertisement, and rant detection produced imbalanced datasets, making it hard to compute reliable metrics: F1 Score, Precision, and Recall Balanced datasets or supervised classifiers would improve performance.

  • Limited Dataset Size
    With only $1100$ rows, our dataset was insufficient to capture the diversity of real-world reviews. Larger datasets would improve accuracy and generalization.


Accomplishments that we’re proud of

  • Built an end-to-end working pipeline that automates tagging, policy enforcement, and quality prediction.
  • Successfully integrated generative AI, semantic similarity models, and ensemble ML techniques into a unified workflow.
  • Generated structured outputs, demonstrating a practical proof-of-concept solution.

What we learned

  • The limitations of smaller open-source models when handling sparse context vs. large LLMs.
  • How data quality and balance critically affect classification metrics.
  • The value of combining rule-based methods (keyword search) with ML-based approaches (embeddings, prompting, ensembles).
  • Integration of diverse AI techniques is as important as improving raw accuracy.

What’s next for Trustworthy Location Reviews

  1. Data Expansion – Collect larger and more balanced datasets.
  2. Model Improvement – Fine-tune domain-specific models for tagging and detection.
  3. Advanced Detection – Move from keyword-based detection to transformer-based classifiers for spam/ads/rant detection.
  4. Scalability – Optimize the pipeline for real-time deployment at scale (millions of reviews).
  5. Explainability – Add interpretability features so flagged reviews are transparent to users and moderators.

Built With

Share this project:

Updates