Inspiration
The digital credibility of location-based platforms is undermined by a high volume of low-quality reviews, such as promotional advertisements, irrelevant content, and rants from users who never visited the location, dilutes the utility of these platforms for users and businesses alike. Manual moderation is impossible at scale, and cloud-based AI solutions can be costly and have privacy concerns. Hence, I was inspired to build a powerful, self-hosted solution that could autonomously cleanse this data.
What it does
ReviewGuard is an automated moderation pipeline that classifies Google Maps reviews into four policy categories: Genuine, Advertisement, Irrelevant, and Rant Without Visit. It processes reviews, assigning a label to each, enabling platforms to automatically flag or remove policy-violating content. The entire system runs locally on a consumer-grade desktop, ensuring low cost and complete data privacy.
How we built it
Open-source LLMs (Gemma 3 and Qwen 3) running locally via Ollama were leveraged. These models excel at understanding the nuanced context and language of reviews. A robust prompt engineering system was designed to perform few-shot classification to optimize model performance.
Challenges we ran into
The most significant challenge was the discrepancy between benchmark and real-world performance for our XGBoost model, which we did not publish. It achieved a high F1-score (0.914) on a curated test set but failed to generalize to unseen examples from the web. The issue is that the artifically generated training data, was too uniform and lacked the variety found in real human language, leading to overfitting.
Accomplishments that we're proud of
- Achieved an F1-score of 0.992 on advertisement detection, 0.902 for irrelevant detection and 0.925 for rant without visit detection using Gemma 3
- Optimized the system to process a batch of 480 reviews in under 16 minutes on a single desktop with consumer-grade GPU, proving the viability of local deployment.
What we learned
- Data quality and diversity is important in model training. A model trained on perfect, real-world data will almost always outperform a model trained on synthetic or limited data.
- Processing reviews in a batch significantly increases throughput and reduces cost, but it can come at a minor cost to accuracy compared to individual analysis
What's next for ReviewGuard
The next step is to utilize the strengths of both classical models and LLMs. Fast and cheap classical machine learning models (e.g. XGBoost, CatBoost) will be utilized as a first-pass filter to confidently classify obvious cases. The more complex, ambiguous reviews will then be routed to the powerful LLM (e.g. Gemma3) for final judgment. This should reduce average cost and latency of the pipeline.
Assets and datasets used
- Google Local review data: https://mcauleylab.ucsd.edu/public_datasets/gdrive/googlelocal/
- Logo: Canva
Built With
- deepseek
- gemma3
- jupyter
- ollama
- pandas
- python
- qwen3
- scikit-learn
Log in or sign up for Devpost to join the conversation.