TikTok TechJam 2025

Review Reviewer
Product of Team Null for TikTok TechJam 2025

Project Overview

Problem

Low-quality comments such as advertisements and irrelevant feedback often clutter platforms like Google Maps, reducing user experience and making it harder for businesses to receive genuine reviews.

Solution

Our product is an AI-based comment classifier that analyzes comments based on text, images, ranking, and information about the place. It automatically flags low-quality and irrelevant comments (text and images) for moderation to improve the overall quality of reviews.

Impact

Users: Enhance trust in location-based reviews by flagging low-quality and irrelevant comments.
Business Owners: Ensure fair representation by highlighting irrelevant or malicious reviews.
Platforms: Automate moderation by flagging suspicious comments, reducing manual workload.

Setup Instructions

Clone the repository:
bash git clone https://github.com/Noob-No-1/TikTok-TechJam-2025.git cd TikTok-TechJam-2025
Install dependencies:
Ensure you have Python installed, then run:
bash pip install -r requirements.txt
Create a .env file:
Add your API keys and environment variables securely in a .env file in the project root.
To get your Groq API key, visit the GROQ website.
For example:
env GROQ_API_KEY=your_api_key_here
Run the comment classifier:
Use the provided scripts or notebooks to start classifying comments.

Tools and APIs

Python: Primary programming language for backend logic and AI integration.
VSCode: Main IDE used for development.
Jupyter Notebook: Used for prototyping and interactive testing.
Groq API: Provides access to free hosted large language models (Llama 3.1 8B) for AI-based text classification.
JSON: Used for structured data exchange between the AI model and application.

Libraries

dotenv: Securely loads environment variables from .env files.
groq: Official Python client for interacting with the Groq API.
json: Standard Python library for parsing and generating JSON data.
transformers: Hugging Face library for leveraging multimodal models (e.g., BLIP for image captioning).
torch: Backend for running deep learning models.
pandas: For dataset manipulation and batch processing.

Multimodal Support

Reviews that include images are handled by generating image captions using the BLIP model from the transformers library. These captions are appended to the review text to provide richer context for classification, enabling the AI to assess both textual and visual content effectively.

How to Reproduce Results

Run a sample classification:
Execute the sample script or notebook to classify a single comment and observe the output.
Note: You can also open and run the demo.ipynb notebook under src to see interactive demonstrations of text-only and image-augmented review classification.
Run batch mode on dataset:
Use the batch processing script to classify multiple comments from a dataset and save the results.
Expected outputs:
The classifier will output structured JSON labels indicating review relevance and any policy violations (e.g., advertisement, irrelevant, rant_no_visit, image_irrelevant, image_advertisement).

Built With

python

Submitted to

TikTok TechJam 2025

Created by

Research and developed the text-based review flagging and graph&text - based review flagging. I also wrote the demo notebook for others to try the project. I originally thought training a model based on the data we have but later realised there is no need to reinvent the wheels and the performance of already developed SOTA model will probably outperform our trained model anyways. This experience has taught me that before you dive into solving the problem, search around for answers (or close enough partial answers) first.

noob@cs Zhihao
Feng Yilong
CHEN YIXUN
peng ziyi
JianWen Lei