Inspiration
Online reviews shape how people choose restaurants, shops, and services , but fake, irrelevant, or low-quality reviews distort trust. Our team wanted to build something that restores confidence in location reviews by automatically filtering out the “noise,” so that users and businesses alike get a fair and accurate picture. That became NoiseGuard
What it does
NoiseGuard is a machine-learning powered filter that classifies reviews into three categories: spam, irrelevant, and relevant. It enforces platform policies by:
- Detecting spammy content (links, discount codes, duplicate text, AI-generated patterns).
- Flagging irrelevant reviews that don’t relate to the location.
- Highlighting authentic, relevant reviews that actually help users make decisions.
How we built it
- Data: Collected and cleaned reviews from open datasets (Google Local Reviews, Yelp) and augmented with synthetic irrelevant samples.
- Feature engineering:
- Relevant: TF-IDF, TF-RF word frequency features.
- Spam: Regex for links/discounts, minimum length thresholds, duplicate detection, future AI-generated detection hooks. *Modeling: Prototyped an ML pipeline with Hugging Face transformers and classical classifiers. *Policy module: Encoded example policies (no ads, no irrelevant content, no rants without visit) into filtering logic.
Challenges we ran into
*Time constraints: Limited 72 hours to go from idea → working prototype. *Unfamiliarity: New tools like Hugging Face models and prompt engineering. *Data gaps: Lack of labeled ground-truth meant we had to create pseudo-labels and augment data manually. *Team coordination: Balancing different skill levels and integrating components under deadline.
Accomplishments that we're proud of
- Delivered a working end-to-end prototype within the hackathon timeline.
- Built a functional spam filter that catches obvious promo and duplicate reviews.
- Extended the system beyond spam to also capture “irrelevant” reviews, aligning tightly with the problem statement.
What we learned
- How to preprocess messy real-world text data.
- Practical ML/NLP techniques for classification, from TF-IDF to transformers.
- The importance of combining rule-based heuristics with ML for robust moderation.
- What it really takes to train and evaluate a model under time pressure.
What’s next for NoiseGuard
*AI-generated detection: Integrate GPTZero or lightweight detectors to catch LLM-written reviews. *Multilingual support: Expand to reviews in multiple languages for global platforms. *User metadata analysis: Detect suspicious accounts by analyzing posting history, frequency, and location patterns. *Interactive demo: Build a dashboard for platforms to visualize and manage flagged reviews in real-time.
Log in or sign up for Devpost to join the conversation.