Inspiration
We were inspired to work on this project because online reviews play a huge role in shaping our decisions, almost every time we choose where to eat, shop, stay, or spend time. However, not all reviews are reliable. Some may not even be related to the location, while others may be fake or misleading. This is why it is important to filter out irrelevant or deceptive reviews to avoid misunderstandings and ensure that people can make better-informed decisions. Our goal was to create a system that can automatically analyze reviews, check their relevance, and identify their sentiment more accurately.
What we learned
Through this project, we learned how different NLP techniques can be combined to solve a real-world problem. We explored spam detection, sentiment analysis, sarcasm detection, summarisation, and relevance scoring. We also saw how each component contributes to a more reliable understanding of online reviews. I also gained experience integrating multiple models into a single pipeline and learned how to handle challenges such as misleading reviews.
How we built it
To build the system, we first collected google reviews using Google Places API and did data cleaning and labelling. A RoBERTa-base model trained for spam classification is used to handle spam or fake reviews. For sentiment analysis, we employed another RoBERTa-base model to classify reviews as positive, negative, or neutral, and an additional model to detect sarcasm. We then summarised the text using Llama3.2 and fed it into spaCy’s model, which evaluates how closely a review relates to the given location. Please refer to the report file for the detailed information about each process.
Challenges we ran into
One of the major challenges was finding a suitable method to calculate the relevance score between a review and its location. This was difficult because the meaning of words can be highly sensitive to context and small changes in wording can significantly affect the relevance. Additionally, testing different approaches was time-consuming, as each method required extensive evaluation on multiple reviews to ensure accuracy and reliability.
Log in or sign up for Devpost to join the conversation.