Inspiration
Online reviews shape how people choose restaurants, shops, and services — but many are cluttered with spam, irrelevant rants, or misleading content. We wanted to restore trust by filtering noise and surfacing only meaningful feedback. Also, we wanted to deep dive into the world of NLP workflows in data science.
What it does
Our system evaluates Google location reviews in real time, automatically detecting: Spam and advertisements, Irrelevant or off-topic content, Complaints from people who likely never visited the location. It then flags or filters these reviews, ensuring that users, businesses, and platforms get more reliable insights.
How we built it
Utilised South Dakota Metadata, and Google Location Dataset Cleaned and labeled data to distinguish relevant vs. irrelevant/spammy reviews/rant_without_visits, using approaches like few shot / zero shot / llm labeling. Trained machine learning models (e.g., XGBoost, transformers, pure transformer approaches from Hugging Face) to classify review quality and relevancy. Applied policy rules to flag violations automatically, using ground rules, such as regex. Built a pipeline for evaluation metrics.
Challenges we ran into
Biggest issue faced was data labeling, we were unsure of what causes the sample data to be labeled wrongly. It may be due to prompt engineering flaws, or model flaws. We tried different approaches such as Hugging Face Classifiers, LLMs. It was either the training that was extremely slow, machine limitations or data labeling accuracy issues.
Accomplishments that we're proud of
We are proud of deep diving into core concepts of NLP, and understanding its fundamentals. For example, we truly understood the limitations of setting ground rules for datasets (for further machine learning model building). We understood the stateof-art multimodal approach in NLP, in predicting the class of text.
What we learned
We have learnt core workflows of NLP, and what is the industry standard workflow, other than data annotation (in which we are not sure of). Also, we understood the power of transformers and its concept of self attention, how it is way more powerful than typical sentiment analysis approach like TF-IDF, embedding etc.
What's next
Nxt steps, is to observe how top teams conduct their NLP workflows, so as to learn core concepts of NLP * Data Science
Log in or sign up for Devpost to join the conversation.