Inspiration

Online reviews shape how people choose restaurants, shops, and services , but fake, irrelevant, or low-quality reviews distort trust. Our team wanted to build something that restores confidence in location reviews by automatically filtering out the “noise,” so that users and businesses alike get a fair and accurate picture. That became NoiseGuard

What it does

NoiseGuard is a machine-learning powered filter that classifies reviews into three categories: spam, irrelevant, and relevant. It enforces platform policies by:

  • Detecting spammy content (links, discount codes, duplicate text, AI-generated patterns).
  • Flagging irrelevant reviews that don’t relate to the location.
  • Highlighting authentic, relevant reviews that actually help users make decisions.

How we built it

  • Data: Collected and cleaned reviews from open datasets (Google Local Reviews, Yelp) and augmented with synthetic irrelevant samples.
  • Feature engineering:
    • Relevant: TF-IDF, TF-RF word frequency features.
    • Spam: Regex for links/discounts, minimum length thresholds, duplicate detection, future AI-generated detection hooks. *Modeling: Prototyped an ML pipeline with Hugging Face transformers and classical classifiers. *Policy module: Encoded example policies (no ads, no irrelevant content, no rants without visit) into filtering logic.

Challenges we ran into

*Time constraints: Limited 72 hours to go from idea → working prototype. *Unfamiliarity: New tools like Hugging Face models and prompt engineering. *Data gaps: Lack of labeled ground-truth meant we had to create pseudo-labels and augment data manually. *Team coordination: Balancing different skill levels and integrating components under deadline.

Accomplishments that we're proud of

  • Delivered a working end-to-end prototype within the hackathon timeline.
  • Built a functional spam filter that catches obvious promo and duplicate reviews.
  • Extended the system beyond spam to also capture “irrelevant” reviews, aligning tightly with the problem statement.

What we learned

  • How to preprocess messy real-world text data.
  • Practical ML/NLP techniques for classification, from TF-IDF to transformers.
  • The importance of combining rule-based heuristics with ML for robust moderation.
  • What it really takes to train and evaluate a model under time pressure.

What’s next for NoiseGuard

*AI-generated detection: Integrate GPTZero or lightweight detectors to catch LLM-written reviews. *Multilingual support: Expand to reviews in multiple languages for global platforms. *User metadata analysis: Detect suspicious accounts by analyzing posting history, frequency, and location patterns. *Interactive demo: Build a dashboard for platforms to visualize and manage flagged reviews in real-time.

Built With

Share this project:

Updates