NoiseGuard

Inspiration

Online reviews shape how people choose restaurants, shops, and services , but fake, irrelevant, or low-quality reviews distort trust. Our team wanted to build something that restores confidence in location reviews by automatically filtering out the “noise,” so that users and businesses alike get a fair and accurate picture. That became NoiseGuard

What it does

NoiseGuard is a machine-learning powered filter that classifies reviews into three categories: spam, irrelevant, and relevant. It enforces platform policies by:

Detecting spammy content (links, discount codes, duplicate text, AI-generated patterns).
Flagging irrelevant reviews that don’t relate to the location.
Highlighting authentic, relevant reviews that actually help users make decisions.

How we built it

Data: Collected and cleaned reviews from open datasets (Google Local Reviews, Yelp) and augmented with synthetic irrelevant samples.
Feature engineering:
- Relevant: TF-IDF, TF-RF word frequency features.
- Spam: Regex for links/discounts, minimum length thresholds, duplicate detection, future AI-generated detection hooks. *Modeling: Prototyped an ML pipeline with Hugging Face transformers and classical classifiers. *Policy module: Encoded example policies (no ads, no irrelevant content, no rants without visit) into filtering logic.

Challenges we ran into

*Time constraints: Limited 72 hours to go from idea → working prototype. *Unfamiliarity: New tools like Hugging Face models and prompt engineering. *Data gaps: Lack of labeled ground-truth meant we had to create pseudo-labels and augment data manually. *Team coordination: Balancing different skill levels and integrating components under deadline.

Accomplishments that we're proud of

Delivered a working end-to-end prototype within the hackathon timeline.
Built a functional spam filter that catches obvious promo and duplicate reviews.
Extended the system beyond spam to also capture “irrelevant” reviews, aligning tightly with the problem statement.

What we learned

How to preprocess messy real-world text data.
Practical ML/NLP techniques for classification, from TF-IDF to transformers.
The importance of combining rule-based heuristics with ML for robust moderation.
What it really takes to train and evaluate a model under time pressure.

What’s next for NoiseGuard

*AI-generated detection: Integrate GPTZero or lightweight detectors to catch LLM-written reviews. *Multilingual support: Expand to reviews in multiple languages for global platforms. *User metadata analysis: Detect suspicious accounts by analyzing posting history, frequency, and location patterns. *Interactive demo: Build a dashboard for platforms to visualize and manage flagged reviews in real-time.

Built With

.env
.gitignore
.py
collab
csv
git
github
google
huggingface
json
jsonl
ollama
pandas
python
scrapify

Submitted to

TikTok TechJam 2025

Created by

I worked on the training script for the model. It was my first time fine-tuning a model from HuggingFace, but I learnt a lot on how to do so from this project, and am keen to continue working on such projects in the future

Tong Kian Kiat
Rayson Tay
Koh Jun Sheng
Sean Leng

Updates

Tong Kian Kiat started this project — Aug 30, 2025 01:17 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.