Introduction

Our solution tackled prompt 1, Filtering the Noise: ML for Trustworthy Location Reviews, addressing the problem of assessing the quality and relevancy of location-based reviews by combining heuristic rules (e.g., detecting ads, irrelevant topics, or no-visit rants) with machine learning and transformer models to automatically classify reviews as valid or invalid. This ensures that spam, promotional, and low-quality content are filtered out, while authentic feedback is preserved.

Problem Statement

Design and implement an ML-based system to evaluate the quality and relevancy of Google location reviews.

Development Tools

VSCode for coding, and Jupyter/Colab for experimentation with preprocessing, model training, and evaluation.

APIs Used

Our code is self-contained.

Libraries and Frameworks

  • Hugging Face Transformers -- for tokenization and Transformer-based NLP models (Qwen).
  • PyTorch -- for deep learning tensor operations and model inference.
  • scikit-learn -- for baseline ML models, evaluation metrics, and feature selection.
  • pandas -- for structured data manipulation and preprocessing.
  • NumPy -- for efficient numerical computations and array operations.
  • Matplotlib -- for plotting and visualizing model performance.
  • Seaborn -- for statistical data visualization and heatmaps.
  • imbalanced-learn -- for handling class imbalance through oversampling techniques.

Assets Used

  • Google Local Reviews on Kaggle
  • Google Wyoming Local Reviews dataset from McAuley Lab, UCSD
  • Preprocessed subsets of the above datasets

Built With

Share this project:

Updates