TikTok TechJam 2025 – Devpost Writeup
1. Problem and Solution
Problem
Online reviews shape decisions about where people eat, shop, and travel. Unfortunately, many reviews are misleading — short spam (“Nice 👍”), disguised advertisements, or rants from people who have never visited the location. These reduce trust for users and unfairly harm businesses.
Solution
We built a machine learning (ML) pipeline with a demo web app that automatically flags reviews as Relevant, Spam, Advertisement, or Rant.
- Model V1 (Machine Learning): Semantic embeddings + Principal Component Analysis (PCA) + Logistic Regression. Simple and interpretable.
- Model V2 (Neural Network): Semantic embeddings + PyTorch Neural Network (NN). Captured deeper patterns and achieved 95.6% test accuracy with a weighted F1 score of 0.93, outperforming V1 across almost all categories.
2. Development Environment and Tools
- Visual Studio Code (VSCode): Development, scripting, debugging
- Google Colaboratory (Colab): GPU-based training and fast experiments
- Flask + Jinja2 templates: Loading the demo web app
- PyTorch: Building, training, and saving the neural network (Model V2)
- Joblib: Saving and reusing PCA + Logistic Regression models (Model V1)
3. APIs Used
- Google Maps API (googlemaps): Metadata like business descriptions to test review–description similarity
- SentenceTransformers API (all-MiniLM-L6-v2): Generated semantic embeddings from review text
4. Libraries and Frameworks
- Core ML / NLP: scikit-learn, sentence-transformers, torch, numpy, pandas
- Web / API: flask, googlemaps, dotenv
- Utilities: nltk, re, joblib, json
5. Assets and Datasets Used
- Google Local Reviews datasets (Kaggle + UCSD)
- 70,000 manually-labelled reviews (Spam, Advertisement, Rant, Relevant)
Extra feature experiments:
- Rating deviation (review rating vs. business average)
- Review–business description similarity
Results showed some promise, but trade-offs made them less reliable than text-only embeddings.
6. Solution Flow
Step 1: Data Cleaning
- Normalized review text: lowercased, stripped punctuation/emojis, removed stopwords, lemmatized words.
- Ensured consistency across dataset.
Step 2: Semantic Embeddings
- Used all-MiniLM-L6-v2 to convert reviews into dense vectors.
- Captured meaning rather than just keywords (e.g., “great service” ≈ “amazing staff”).
Step 3a: Model V1 – Logistic Regression (Baseline)
- PCA reduced embeddings (384 → 128 dimensions).
- Logistic Regression classifier trained on compressed vectors.
- Results: 94.0% accuracy; F1 scores – Relevant: 0.966, Spam: 0.795.
Step 3b: Model V2 – Neural Network (Final)
- PyTorch NN: Linear → ReLU → Dropout → Linear (4-class output).
- Trained with Adam optimizer + CrossEntropy loss.
- Results: 95.6% accuracy; F1 scores – Relevant: 0.976, Spam: 0.865.
Step 4: Feature Engineering Experiments
- Statistical analysis with metadata (rating, avg rating, #reviews, pictures, owner responses).
- Findings:
- Owner responses correlated with Ads more than Spam or Rants.
- Ads generally had higher ratings than Spam or Rants.
- Owner responses correlated with Ads more than Spam or Rants.
- Limitations: Small sample size (Ads n=93) reduced reliability.
- Rating deviation and review–description similarity tested, but only rant detection benefitted.
Step 5: Evaluation
- Metrics: Precision, Recall, F1.
- Headline results: 95.6% test accuracy, Weighted F1 ≈ 0.93.
- Relevant reviews preserved with very high recall (0.978).
Step 6: Flask Web App (UI)
- Enter place → Select correct match → Load real reviews → Classify into categories.
- Optional: Business description improves rant detection.
- UI shows performance metrics.
7. How the Solution Addresses the Problem
- Semantic understanding: Embeddings interpret meaning, not just keywords.
- Policy alignment: Maps directly to TikTok’s moderation categories (Spam, Ads, Irrelevant, Rants).
- Scalability: Neural networks scale well to large datasets.
- Iteration-driven improvement: Compared classical ML vs. NN; NNs proved stronger.
- Practical demo: Flask app + interactive UI for real-time deployment potential.
8. Conclusion
Our project shows how ML + NLP can make review platforms more trustworthy by filtering noise and surfacing genuinely helpful feedback.
- Neural network achieved 95.6% accuracy, Weighted F1 ≈ 0.93.
- High recall ensures valuable reviews are preserved.
- Spam detection significantly improved platform quality.
- Rants remain challenging, but experimental signals (review–description similarity) provide future opportunities.
Impact:
- Users make better decisions with reliable reviews.
- Businesses get fairer representation.
- Platforms benefit from scalable, automated moderation.
9. Interactive Demo
- Search: Enter the name of any place.
- Select: Choose the correct match from a candidate list.
- Classify: Loads reviews with classification tags (Relevant, Spam, Rant, Advertisement).
- Optional description input: Improves rant detection.
- See metrics: Model performance shown in UI.
🔗 GitHub Repo: Boolean Brotherhood
▶️ YouTube Demo: Watch here
Built With
- css
- dotenv
- flask
- google-maps
- html
- javascript
- jinja
- joblib
- python
- pytorch
- scikit
- scikit-learn
Log in or sign up for Devpost to join the conversation.