Inspiration
Location reviews shape real-world decisions, but promotions, spam and off-topic rants can drown out genuine experiences. This hurts:
Users: harder to trust what they read
Businesses: unfair reputation swings
Platforms: costly manual moderation and credibility risk
What it does
Ingests raw Google reviews, keeps text reviews (most informative)
Flags violations: Advertisement, Spam, Irrelevant topic, Rant without visit
Produces a final Clean vs Flagged decision for each review
Handles multiple languages via translate-then-classify
Outputs Excel-friendly CSVs for auditing, training and deployment
How we built it
Data: UCSD review datasets (for training), as well as Kaggle and our Singapore scrape (for testing).
Labelling: Pseudo-labels from GPT-5 (policy prompt), plus a manually labelled dataset.
Cleaning: Drop empty content.
Features: We evaluated metadata and found it weak for relevancy; we therefore focus on the text signal. Reviews with no text are treated as Flagged.
Model: BERT classifier, single task (Clean vs Flagged) trained on balanced data.
Multilingual: Detect language → Google Translate API → English → Remove emojis → BERT.
Testing: Tested on Kaggle and our Singapore datasets.
Alt path: A notebook shows a direct LLM classifier, we discarded it due to cost.
Challenges we ran into
Short Forms & Abbreviations: Users often write in shorthand or local slang (e.g., “ugwim”), which makes interpretation difficult.
Spam & Promotional Noise: Repeated ads or copy-paste reviews still appear and distort credibility.
Location-Specific Context: Some reviews only make sense within a local or cultural context, which complicates classification.
Emoji-Only Reviews: Posts with only emojis don’t convey meaningful feedback and are flagged as irrelevant by our model.
Non-text reviews: Posts without text are flagged by our solution, as text should be main source of reviews and reviews without text are not representative of the actual reviews.
Accomplishments that we are proud of
Training data: balanced Clean vs Flagged after GPT-5 pseudo-labels (UCSD dataset)
Test sets: Kaggle + our Singapore scrape
Metrics: Accuracy [91.27%], Precision [98.28%], Recall [91.27%], F1 [94.48%]
Qualitative: clear catches on ad/URL spam, off-topic rants reduced, non-text entries flagged
What we learned
Text alone can be strong for this task, metadata such as rating is not a reliable relevancy cue
Translate-then-classify keeps the model simple and robust
Pseudo-labels accelerate iteration; a small manual set is still vital for calibration
What's next for TrustLens: ML-based moderation for Google location reviews.
Semi-supervised loop: re-train on confident BERT predictions (self-training)
Business-level analytics (per-place violation heatmaps)
UI for human-in-the-loop review and quick policy appeals
Goes beyond standard multilingual support by understanding regional slang, abbreviations, and country-specific expressions.
Built With
- anaconda
- apify
- bert
- cuda
- datasets
- github
- google-local-reviews
- google-places
- google-translate-api
- huggingface
- jupyter
- kaggle
- manually-labeled-data
- openai-api
- pandas
- python
- pytorch
- scikit-learn
- transformers
- vscode
Log in or sign up for Devpost to join the conversation.