Introduction
Our solution tackled prompt 1, Filtering the Noise: ML for Trustworthy Location Reviews, addressing the problem of assessing the quality and relevancy of location-based reviews by combining heuristic rules (e.g., detecting ads, irrelevant topics, or no-visit rants) with machine learning and transformer models to automatically classify reviews as valid or invalid. This ensures that spam, promotional, and low-quality content are filtered out, while authentic feedback is preserved.
Problem Statement
Design and implement an ML-based system to evaluate the quality and relevancy of Google location reviews.
Development Tools
VSCode for coding, and Jupyter/Colab for experimentation with preprocessing, model training, and evaluation.
APIs Used
Our code is self-contained.
Libraries and Frameworks
- Hugging Face Transformers -- for tokenization and Transformer-based NLP models (Qwen).
- PyTorch -- for deep learning tensor operations and model inference.
- scikit-learn -- for baseline ML models, evaluation metrics, and feature selection.
- pandas -- for structured data manipulation and preprocessing.
- NumPy -- for efficient numerical computations and array operations.
- Matplotlib -- for plotting and visualizing model performance.
- Seaborn -- for statistical data visualization and heatmaps.
- imbalanced-learn -- for handling class imbalance through oversampling techniques.
Assets Used
- Google Local Reviews on Kaggle
- Google Wyoming Local Reviews dataset from McAuley Lab, UCSD
- Preprocessed subsets of the above datasets
Log in or sign up for Devpost to join the conversation.