TikTok Review Analysis Dashboard

TechJam 2025 – Track 1: Filtering the Noise (ML for Trustworthy Location Reviews)

This project addresses TikTok's hackathon challenge "Filtering the Noise: ML for Trustworthy Location Reviews" by building an advanced AI-powered fake review detection system that goes beyond simple spam detection. Our system provides intelligent, explainable decisions for review quality management with a clean, interactive dashboard.

Summary

Online reviews influence business reputation and user trust, yet many are spam, fake, irrelevant, or rating-manipulated. Manual moderation is slow and inconsistent, while existing models often reduce reviews to simple “positive/negative” sentiment, which is not enough to enforce real policies.

Our project delivers a complete moderation pipeline with two modes. In single-review mode, the system processes new reviews in real time, classifying and filtering them before publication. In batch mode, it can process thousands of existing reviews at once, cleaning up noisy datasets and enforcing consistency at scale. Both modes use the same pipeline: text and metadata features are extracted, passed to a quantized Qwen model, and evaluated against a structured policy framework.

The core improvement is our 17 moderation categories, which move beyond broad sentiment into policy-aligned outcomes. For example, reviews with phone numbers or booking links are flagged as SPAM, overly generic or impossible claims as FAKE_REVIEW, and mismatches between ratings and text as RATING_MANIPULATION. Legitimate reviews are preserved and classified more specifically — service, value, cleanliness, wait time, location, or portion size — giving platforms a clear picture of what real customers are saying.

To make these decisions reliable, we use prompt engineering to build context-aware inputs. Prompts incorporate user behavior (review history, rating consistency), business context, and sentiment mismatch rules. They also embed a detection checklist so the model is guided to choose one of the 17 categories consistently. Each output follows a structured format with a category, confidence score, and reasoning, making the system transparent and easy to integrate into moderation workflows.

Together, the dual-mode pipeline, policy-driven categories, and prompt engineering improvements provide a practical solution to review moderation. It shows how platforms can prevent low-quality reviews from being published while also cleaning existing data, delivering moderation that is scalable, fair, and explainable.


Review Processing Modes

Our pipeline supports two complementary modes of operation:

1. Single Review Mode (Real-Time Moderation)

  • Designed for new incoming reviews
  • Each review is classified instantly before being published
  • Enables proactive moderation: spam, fake, or irrelevant reviews can be filtered out in real time
  • Demonstrates how the system could be integrated into a live platform

2. Batch Mode (Scalable Historical Cleanup)

  • Designed for existing review datasets
  • Processes large volumes of reviews at once
  • Ideal for cleaning up historical noise and ensuring datasets meet platform policy standards
  • Demonstrates the pipeline’s scalability and efficiency at larger scale

Together, these modes provide an end-to-end solution:

  • Real-time moderation for new reviews
  • Scalable batch analysis for existing datasets

What We Built

Core Architecture

Our solution consists of a multimodal AI system with clear backend/frontend separation:

  1. QwenReviewClassifier – 4-bit quantized Qwen 2.5-3B-Instruct model for text classification
  2. MetadataClassifier – Pattern-based analysis for review metadata
  3. FeatureExtractor – Multi-dimensional feature extraction from review text and ratings
  4. RecommendationEngine – AI-powered moderation recommendations
  5. DashboardComponents – Streamlit-based UI components with professional styling

AI Pipeline Flow

Review Text + Metadata → Feature Extraction → Qwen Classification → Pattern & Metadata Analysis → Policy Enforcement → Recommendation + Confidence Score → Dashboard Display


Pipeline Features

Prompt Engineering in the Pipeline

A major strength of our system lies in how we use prompt engineering to guide the model toward reliable and policy-aligned classifications. Rather than giving the model raw text and hoping it guesses correctly, we embed structured context and rules into each prompt. This ensures that every output is both explainable and consistent with platform policies.

1. Context-Aware Prompts

Our prompts dynamically adapt to each review by including:

  • User behavioral context (review history, rating consistency, reviewer type).
  • Business context (the specific location being reviewed).
  • Sentiment context (positive/negative keywords for mismatch detection).

This prevents generic outputs and makes classification more precise. For example, if a “new user” leaves an extreme 5-star review with negative words, the prompt signals that this may be a fake or mismatched review.

2. Real-Time Mismatch Detection

We embed sentiment alignment rules directly into the prompts. If a review text is strongly negative but paired with a 5-star rating, the system explicitly marks it as RATING MISMATCH. Similarly, mild text with extreme ratings is flagged as RATING MANIPULATION. This closes one of the biggest gaps in existing review moderation.

3. Policy-Driven Checklists

The prompts also include a detailed classification checklist that maps directly to our 17 moderation categories.

  • Spam indicators: phone numbers, “call now”, booking instructions → SPAM.
  • Overly perfect, generic praise, impossible claims → FAKE_REVIEW.
  • Rating and text misalignment → RATING_MANIPULATION.
  • Legitimate reviews with balanced detail → LEGITIMATE or specific subcategories like SERVICE_FOCUSED.

By embedding this structured framework, the model no longer classifies on vague “positive/negative” sentiment, but instead enforces concrete moderation policies.

4. Structured, Explainable Outputs

Every model response follows a fixed format: CATEGORY: [one of 17 policies] CONFIDENCE: [0.000–1.000] REASONING: [short explanation with evidence]

This ensures outputs can be parsed automatically, while still providing reasoning that moderators or users can understand.

Why This Matters

Prompt engineering makes the pipeline transparent, robust, and aligned to real-world needs. Instead of a black-box classifier, we deliver a system that:

  • Distinguishes between different types of violations, not just “good vs bad”.
  • Provides confidence scores and reasoning for every decision.
  • Scales across both single-review (real-time) and batch (bulk cleanup) use cases.

By combining 17 policy categories with advanced prompt design, our solution offers a moderation framework that is far more accurate, fair, and trustworthy than current review filtering methods.


Policy Framework (17 Categories)

High Priority Policies (Remove/Flag)

  • SPAM – Contact info, solicitation, promotional content
  • ADVERTISEMENTS – Marketing, business promotions
  • FAKE_REVIEW – Fabricated content, overly perfect claims
  • NO_EXPERIENCE – User admits never using product/service
  • RATING_TEXT_MISMATCH – Rating conflicts with text sentiment
  • REPETITIVE_SPAM – Identical or near-identical content
  • COMPETITOR_COMPARISON – Focus on other businesses
  • IRRELEVANT – Off-topic content
  • LOW_QUALITY – Very short or uninformative reviews
  • OFFENSIVE – Inappropriate language, personal attacks

Legitimate Categories (Approve)

  • LEGITIMATE – Balanced, specific, honest feedback
  • SERVICE_FOCUSED – Staff, customer service, interactions
  • VALUE_FOCUSED – Pricing and value for money
  • CLEANLINESS_FOCUSED – Hygiene and sanitation standards
  • WAIT_TIME_FOCUSED – Service speed and efficiency
  • LOCATION_FOCUSED – Accessibility and convenience
  • PORTION_SIZE_FOCUSED – Food quantity and satisfaction

Language Category

  • NON_ENGLISH – Non-English reviews (flagged for manual review)

Policy Enforcement Mechanism

  • Confidence-Based Actions

    • High Confidence (>85%): Auto-action (REMOVE/APPROVE)
    • Medium Confidence (65–85%): FLAG_FOR_REVIEW
    • Low Confidence (<65%): Human review required
  • Explainable Decisions
    Each enforcement includes:

    • Category violated
    • Confidence score
    • Reasoning explanation
    • Evidence (features that triggered detection)

Example Policy Enforcement

Example 1: SPAM Detection

  • Input: "Great restaurant! Call 555-123-4567 for reservations!"
  • Policy: SPAM
  • Confidence: 92%
  • Reasoning: Contains phone number and solicitation
  • Action: REMOVE

Example 2: Rating Mismatch

  • Input: "Terrible food, worst service ever!" (5-star rating)
  • Policy: RATING_TEXT_MISMATCH
  • Confidence: 89%
  • Reasoning: Rating conflicts with negative sentiment
  • Action: REMOVE

Example 3: Legitimate Review

  • Input: "Good food, friendly service. Portions could be bigger but overall pleasant experience."
  • Policy: LEGITIMATE
  • Confidence: 94%
  • Reasoning: Balanced review with specific details
  • Action: APPROVE

Results (Sample Run)

  • Accuracy (pseudo-labeled test set): ~82% F1 across categories
  • Rating mismatch detection: 90% precision on mismatched samples
  • Inference speed: 30–60 seconds per review (optimized with quantization)

Core AI Features

  • Advanced AI Classification

    • Qwen 2.5-3B-Instruct with contextual understanding
    • 4-bit quantization for efficient GPU usage (~4GB VRAM)
    • Context-aware classification with “thinking mode”
  • Multimodal Analysis

    • Text analysis with sentiment understanding
    • Metadata patterns (length, time, history)
    • User behavior detection
    • Over 35 technical features
  • Rating-Text Contradiction Detection

    • Identifies mismatches between star rating and review content
    • Example: 5-star rating with "terrible food" → RATING_TEXT_MISMATCH
  • Smart Recommendations

    • Action suggestions with confidence scoring
    • Hybrid scoring: 70% LLM + 30% pattern agreement
    • Priority-based category selection

Technical Features

  • Enhanced Pattern Analysis
    • 15+ detection algorithms
    • Keyword/phrase matching, repetitive content detection
    • Language detection
  • Feature Extraction (35+ features)
    • Text features (length, sentence count, average word length)
    • Language patterns (caps ratio, punctuation, exclamation)
    • Spam indicators (contact info, promo language)
    • User behavior (review count, rating patterns)
    • Content quality (relevance, readability, sentiment)

Built With

Share this project:

Updates