Inspiration

Researching for new places to visit is exhausting. Reviews and recommendations are the only ways we can hope to judge a place we never visit. Thus, reviews from the internet play an integral role in influencing how people perceive the place for the first time. While some of the reviews are useful, there are a lot of advertisements and irrelevant, low-quality reviews on the platform.

What it does

Our model tries to flag out those reviews using rule-based policies and LLM to gauge whether a review violates the standard of an acceptable comment.

How we built it

Using Python and MongoDB, we successfully constructed our own schema for place, review, and user. We also fetched data from Kaggle and scraped data directly from Google Maps using the interactive web scraper Selenium package. Collecting these data, we preprocessed it by translating the comments to English if the detected language (langid) is not English. Using this, we can feature (textual and metadata) engineer the reviews to score the sentiment scores. The rule-based and transformer-based policies are then enforced to these featured reviews to flag possible violations and generate a report of the prediction.

Challenges we ran into

Currently, we are using a highly dependent rule-based policies while using a bit of transformer for our Policy Enforcer module. This would flag many things incorrectly, as the system is not smart enough to detect advanced rules such as ranting without visiting/context or irrelevant comments.

What we learned

Developing from scratch all flows of AI/ML products such as data collection, database schema modelling, data preprocessing, EDA, feature engineering, model development, and testing/evaluation.

What's next for Filtering the Noise

Transition to smarter Policy Enforcer module using context retrieval (RAG response) to help the model understand the nature of the place better before marking a comment irrelevant. We also want to change the LLM to a smarter LLM and possibly store vector embeddings in a VectorDB to help with semantic search.

Built With

Share this project:

Updates