Trustworthy Location Reviews
Inspiration
We could identify a real-world challenge faced by leading tech companies like \textbf{TikTok}---ensuring that online reviews remain trustworthy and relevant. The problem's broad applicability, from e-commerce platforms to video-sharing apps, made it especially compelling. Moreover, it allowed us to apply cutting-edge \textbf{machine learning} techniques in a meaningful way.
What it does
Our solution tackles four key problems in review moderation:
- Location Tagging – Automatically labels reviews with location categories such as restaurant, office, home, park, hotel, or shop using rating category data. For instance, labels review location as "restaurant" with a list of words: taste, menu, indoor, outdoor, and atmosphere.
- Content Filtering – Labels reviews for spam (0/1), advertisement (0/1), and irrelevancy (0/1).
- Policy Enforcement – Applies rules defined in the problem statement to filter out spamming, advertising, and ranting.
- Review Quality Prediction – Uses ML to predict spam, advertisement, irrelevancy, and rant labels to assess overall review quality.
How we built it
- Location Tagging: Implemented few-shot prompting with the
flan-t5-smallmodel for classification from sparse category words. - Irrelevancy Detection: Used semantic similarity between location embeddings and review text embeddings via the
all-MiniLM-L6-v2model. - Spam/Advertisement/Rant Detection: Developed keyword-based text search patterns to flag low-quality content.
- Integration: Combined these approaches into an ensemble pipeline that outputs structured quality labels for every review.
Challenges we ran into
Sparse Context for Location Classification
Extracting meaningful location labels from limited category words was difficult on smaller open-source models likeflan-t5-small.- Example: Identifying “restaurant” from words like taste, menu, ambience was unreliable.
- Larger models like GPT-4.5 performed significantly better, showing the impact of model size and training data.
- Example: Identifying “restaurant” from words like taste, menu, ambience was unreliable.
Data Imbalance
Our keyword-based tagging for spam, advertisement, and rant detection produced imbalanced datasets, making it hard to compute reliable metrics: F1 Score, Precision, and Recall Balanced datasets or supervised classifiers would improve performance.Limited Dataset Size
With only $1100$ rows, our dataset was insufficient to capture the diversity of real-world reviews. Larger datasets would improve accuracy and generalization.
Accomplishments that we’re proud of
- Built an end-to-end working pipeline that automates tagging, policy enforcement, and quality prediction.
- Successfully integrated generative AI, semantic similarity models, and ensemble ML techniques into a unified workflow.
- Generated structured outputs, demonstrating a practical proof-of-concept solution.
What we learned
- The limitations of smaller open-source models when handling sparse context vs. large LLMs.
- How data quality and balance critically affect classification metrics.
- The value of combining rule-based methods (keyword search) with ML-based approaches (embeddings, prompting, ensembles).
- Integration of diverse AI techniques is as important as improving raw accuracy.
What’s next for Trustworthy Location Reviews
- Data Expansion – Collect larger and more balanced datasets.
- Model Improvement – Fine-tune domain-specific models for tagging and detection.
- Advanced Detection – Move from keyword-based detection to transformer-based classifiers for spam/ads/rant detection.
- Scalability – Optimize the pipeline for real-time deployment at scale (millions of reviews).
- Explainability – Add interpretability features so flagged reviews are transparent to users and moderators.
Log in or sign up for Devpost to join the conversation.