TikTok TechJam Submission 2025

Introduction

Our solution tackled prompt 1, Filtering the Noise: ML for Trustworthy Location Reviews, addressing the problem of assessing the quality and relevancy of location-based reviews by combining heuristic rules (e.g., detecting ads, irrelevant topics, or no-visit rants) with machine learning and transformer models to automatically classify reviews as valid or invalid. This ensures that spam, promotional, and low-quality content are filtered out, while authentic feedback is preserved.

Problem Statement

Design and implement an ML-based system to evaluate the quality and relevancy of Google location reviews.

Development Tools

VSCode for coding, and Jupyter/Colab for experimentation with preprocessing, model training, and evaluation.

APIs Used

Our code is self-contained.

Libraries and Frameworks

Hugging Face Transformers -- for tokenization and Transformer-based NLP models (Qwen).
PyTorch -- for deep learning tensor operations and model inference.
scikit-learn -- for baseline ML models, evaluation metrics, and feature selection.
pandas -- for structured data manipulation and preprocessing.
NumPy -- for efficient numerical computations and array operations.
Matplotlib -- for plotting and visualizing model performance.
Seaborn -- for statistical data visualization and heatmaps.
imbalanced-learn -- for handling class imbalance through oversampling techniques.