Trustworthy Location Reviews

Inspiration

We could identify a real-world challenge faced by leading tech companies like \textbf{TikTok}---ensuring that online reviews remain trustworthy and relevant. The problem's broad applicability, from e-commerce platforms to video-sharing apps, made it especially compelling. Moreover, it allowed us to apply cutting-edge \textbf{machine learning} techniques in a meaningful way.

What it does

Our solution tackles four key problems in review moderation:

Location Tagging – Automatically labels reviews with location categories such as restaurant, office, home, park, hotel, or shop using rating category data. For instance, labels review location as "restaurant" with a list of words: taste, menu, indoor, outdoor, and atmosphere.
Content Filtering – Labels reviews for spam (0/1), advertisement (0/1), and irrelevancy (0/1).
Policy Enforcement – Applies rules defined in the problem statement to filter out spamming, advertising, and ranting.
Review Quality Prediction – Uses ML to predict spam, advertisement, irrelevancy, and rant labels to assess overall review quality.

How we built it

Location Tagging: Implemented few-shot prompting with the flan-t5-small model for classification from sparse category words.
Irrelevancy Detection: Used semantic similarity between location embeddings and review text embeddings via the all-MiniLM-L6-v2 model.
Spam/Advertisement/Rant Detection: Developed keyword-based text search patterns to flag low-quality content.
Integration: Combined these approaches into an ensemble pipeline that outputs structured quality labels for every review.

Challenges we ran into

Sparse Context for Location Classification
Extracting meaningful location labels from limited category words was difficult on smaller open-source models like flan-t5-small.
- Example: Identifying “restaurant” from words like taste, menu, ambience was unreliable.
- Larger models like GPT-4.5 performed significantly better, showing the impact of model size and training data.
Data Imbalance
Our keyword-based tagging for spam, advertisement, and rant detection produced imbalanced datasets, making it hard to compute reliable metrics: F1 Score, Precision, and Recall Balanced datasets or supervised classifiers would improve performance.
Limited Dataset Size
With only $1100$ rows, our dataset was insufficient to capture the diversity of real-world reviews. Larger datasets would improve accuracy and generalization.

Accomplishments that we’re proud of

Built an end-to-end working pipeline that automates tagging, policy enforcement, and quality prediction.
Successfully integrated generative AI, semantic similarity models, and ensemble ML techniques into a unified workflow.
Generated structured outputs, demonstrating a practical proof-of-concept solution.

What we learned

The limitations of smaller open-source models when handling sparse context vs. large LLMs.
How data quality and balance critically affect classification metrics.
The value of combining rule-based methods (keyword search) with ML-based approaches (embeddings, prompting, ensembles).
Integration of diverse AI techniques is as important as improving raw accuracy.

What’s next for Trustworthy Location Reviews

Data Expansion – Collect larger and more balanced datasets.
Model Improvement – Fine-tune domain-specific models for tagging and detection.
Advanced Detection – Move from keyword-based detection to transformer-based classifiers for spam/ads/rant detection.
Scalability – Optimize the pipeline for real-time deployment at scale (millions of reviews).
Explainability – Add interpretability features so flagged reviews are transparent to users and moderators.

Built With

csv
json
png
python

Updates

Rayaan Quraishi started this project — Aug 30, 2025 05:15 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.