Sieve the deceive

Location-based review platforms such as Google Maps often face the challenge of low-quality, irrelevant, or misleading reviews. Many reviews are advertisements, vague one-liners, or written by people who have never visited the location. These undermine trust and make it difficult for genuine users to assess the quality of a place. Hence the problem statement arises and we focus on filtering out ads, irrelevant content, non-visit reviews, and vague/low-information comments to ensure that users and businesses alike benefit from authentic and trustworthy feedback.

Solution Overview We developed an OpenAI-powered review evaluation system that automatically analyzes comments and flags those that are likely to be untrustworthy or irrelevant. The solution works by combining natural language processing (NLP) with machine learning classification, using both rule-based heuristics and LLM-powered reasoning.

Afterwards, we also classification and removed further anamolities through clustering per company. to ensure a more reliable and trustworthy result.

Key Features Multi-criteria classification: Detects Ads, Irrelevant content, Non-visit indicators, and Vagueness. Hybrid evaluation approach: Combines heuristic rules (e.g., presence of promotional keywords) with LLM-based contextual understanding. Quality & Relevance scoring: Assigns each review a score reflecting its usefulness to end-users. Customizable output: Provides results in structured formats (JSON/CSV) for easy integration with dashboards or third-party apps. Scalable design: Built to process large batches of reviews efficiently.

How It Solves the Problem Trustworthiness: Filters out misleading or spam reviews, ensuring only genuine feedback is highlighted. Relevance: Identifies whether the comment is truly based on a visit experience.

Langugae Development Tools Google Colab – For experimentation, model fine-tuning, and dataset preprocessing. Visual Studio Code with conda environment - for statisical showcase. (Still in experimentation stage hence did not migrate the model to main IDE yet)

APIs Used OpenAI GPT-4o – For natural language classification and nuanced contextual understanding.

Libraries & Frameworks Python (core language) pandas & numpy – Data preprocessing and manipulation. scikit-learn – ML classifiers and evaluation metrics. Hugging Face Transformers – For fine-tuning NLP models where required. PyTorch – Model training and inference. Regex – Heuristic text pattern matching (ads, links, promo keywords). tldextract - accurately serperate URL's subdomain, domain and public suffix using PSL. backoff - retry failed API attempts, where backoff introduced delay between them tqdm - to track completion progress for AI matplot/seaborn - for plotting graphs, diagrams, visualisation purposes

Assets & Datasets Google Review Data: Open datasets containing Google location reviews (e.g., Google Local Reviews on Kaggle: https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews)