TikTok TechJam 2025 – Devpost Writeup

1. Problem and Solution

Problem
Online reviews shape decisions about where people eat, shop, and travel. Unfortunately, many reviews are misleading — short spam (“Nice 👍”), disguised advertisements, or rants from people who have never visited the location. These reduce trust for users and unfairly harm businesses.

Solution
We built a machine learning (ML) pipeline with a demo web app that automatically flags reviews as Relevant, Spam, Advertisement, or Rant.

  • Model V1 (Machine Learning): Semantic embeddings + Principal Component Analysis (PCA) + Logistic Regression. Simple and interpretable.
  • Model V2 (Neural Network): Semantic embeddings + PyTorch Neural Network (NN). Captured deeper patterns and achieved 95.6% test accuracy with a weighted F1 score of 0.93, outperforming V1 across almost all categories.

2. Development Environment and Tools

  • Visual Studio Code (VSCode): Development, scripting, debugging
  • Google Colaboratory (Colab): GPU-based training and fast experiments
  • Flask + Jinja2 templates: Loading the demo web app
  • PyTorch: Building, training, and saving the neural network (Model V2)
  • Joblib: Saving and reusing PCA + Logistic Regression models (Model V1)

3. APIs Used

  • Google Maps API (googlemaps): Metadata like business descriptions to test review–description similarity
  • SentenceTransformers API (all-MiniLM-L6-v2): Generated semantic embeddings from review text

4. Libraries and Frameworks

  • Core ML / NLP: scikit-learn, sentence-transformers, torch, numpy, pandas
  • Web / API: flask, googlemaps, dotenv
  • Utilities: nltk, re, joblib, json

5. Assets and Datasets Used

  • Google Local Reviews datasets (Kaggle + UCSD)
  • 70,000 manually-labelled reviews (Spam, Advertisement, Rant, Relevant)

Extra feature experiments:

  • Rating deviation (review rating vs. business average)
  • Review–business description similarity

Results showed some promise, but trade-offs made them less reliable than text-only embeddings.


6. Solution Flow

Step 1: Data Cleaning

  • Normalized review text: lowercased, stripped punctuation/emojis, removed stopwords, lemmatized words.
  • Ensured consistency across dataset.

Step 2: Semantic Embeddings

  • Used all-MiniLM-L6-v2 to convert reviews into dense vectors.
  • Captured meaning rather than just keywords (e.g., “great service” ≈ “amazing staff”).

Step 3a: Model V1 – Logistic Regression (Baseline)

  • PCA reduced embeddings (384 → 128 dimensions).
  • Logistic Regression classifier trained on compressed vectors.
  • Results: 94.0% accuracy; F1 scores – Relevant: 0.966, Spam: 0.795.

Step 3b: Model V2 – Neural Network (Final)

  • PyTorch NN: Linear → ReLU → Dropout → Linear (4-class output).
  • Trained with Adam optimizer + CrossEntropy loss.
  • Results: 95.6% accuracy; F1 scores – Relevant: 0.976, Spam: 0.865.

Step 4: Feature Engineering Experiments

  • Statistical analysis with metadata (rating, avg rating, #reviews, pictures, owner responses).
  • Findings:
    • Owner responses correlated with Ads more than Spam or Rants.
    • Ads generally had higher ratings than Spam or Rants.
  • Limitations: Small sample size (Ads n=93) reduced reliability.
  • Rating deviation and review–description similarity tested, but only rant detection benefitted.

Step 5: Evaluation

  • Metrics: Precision, Recall, F1.
  • Headline results: 95.6% test accuracy, Weighted F1 ≈ 0.93.
  • Relevant reviews preserved with very high recall (0.978).

Step 6: Flask Web App (UI)

  • Enter place → Select correct match → Load real reviews → Classify into categories.
  • Optional: Business description improves rant detection.
  • UI shows performance metrics.

7. How the Solution Addresses the Problem

  • Semantic understanding: Embeddings interpret meaning, not just keywords.
  • Policy alignment: Maps directly to TikTok’s moderation categories (Spam, Ads, Irrelevant, Rants).
  • Scalability: Neural networks scale well to large datasets.
  • Iteration-driven improvement: Compared classical ML vs. NN; NNs proved stronger.
  • Practical demo: Flask app + interactive UI for real-time deployment potential.

8. Conclusion

Our project shows how ML + NLP can make review platforms more trustworthy by filtering noise and surfacing genuinely helpful feedback.

  • Neural network achieved 95.6% accuracy, Weighted F1 ≈ 0.93.
  • High recall ensures valuable reviews are preserved.
  • Spam detection significantly improved platform quality.
  • Rants remain challenging, but experimental signals (review–description similarity) provide future opportunities.

Impact:

  • Users make better decisions with reliable reviews.
  • Businesses get fairer representation.
  • Platforms benefit from scalable, automated moderation.

9. Interactive Demo

  • Search: Enter the name of any place.
  • Select: Choose the correct match from a candidate list.
  • Classify: Loads reviews with classification tags (Relevant, Spam, Rant, Advertisement).
  • Optional description input: Improves rant detection.
  • See metrics: Model performance shown in UI.

🔗 GitHub Repo: Boolean Brotherhood
▶️ YouTube Demo: Watch here

+ 40 more
Share this project:

Updates