Issue
Our solution addresses the issue of irrelevant reviews on Google that do not provide effective feedback on restaurants/locations. As a reader intending to use Google reviews to learn more about a place, I would want to filter out any pointless reviews so that I can focus on those that matter.
How our solution helps
Our ML model filters out irrelevant reviews based on a few potential violations - irrelevance, advertisements, or rants without visits. These were selected based on the examples provided on Tiktok's Devpost problem statement. We discussed and agreed that these issues were the main culprits of bad reviews.
Our program first takes in a csv file of reviews, and cleans up all reviews.
Cleaning:
- URLs removed
- Whitespaces standardised
- Non-printable characters removed
- Reviews without text removed
- Reviews less than 3 words removed
- Duplicate reviews removed
This effectively detects irrelevant content and labels them as such, filtering out the "noisy" reviews.
We mainly used Python as our main language for writing the program and training the model. VS Code was our editor of choice as it is modular and easy to set up. We also used GitHub for version control and code review.
We largely used Hugging Face transformers to import models such as roberta-base-openai-detector and bert-base-uncased for training, Scikit-learn for training data, and pandas for preprocessing.
As for datasets, we used Google Local Reviews datasets and labelled them manually with tags for model training.
Challenges we ran into
Short time frame to learn about AI and ML, having to debug and get the project working in time for submission.
Accomplishments that we're proud of
Learning more about ML in a few days and building something out of it.
What we learned
Better project management, ML models and how they work, as well as new libraries like transformers.
What's next for Noise Cutting
Year 2 Sem 1!
Log in or sign up for Devpost to join the conversation.