Inspiration
Online platforms are flooded with location-based reviews, but many are noisy: ads, irrelevant rants, or generic one-liners. This dilutes trust and makes it harder for businesses and customers to find reliable insights. We wanted to solve this by creating a pipeline that identifies which reviews are genuinely helpful.
What it does
Our system preprocesses raw review text and fine-tunes RoBERTa to classify reviews as relevant, irrelevant, advert, rant_no_visit. This improves fairness, trust, and user experience on review platforms.
How we built it
Developed a custom preprocessing tool (review_preprocess.py) to normalize text: lowercasing, URL/character cleanup, and stopword removal with negation handling. Designed a PyTorch ReviewDataset class and used Hugging Face Transformers (FacebookAI/roberta-base). Fine-tuned RoBERTa on labeled review datasets to detect quality and relevance. Evaluated using scikit-learn metrics (accuracy, precision, recall, F1) and visualized results with seaborn/matplotlib.
Challenges we ran into
Designing preprocessing steps that removed noise without stripping away important context. Working with no labeled data for fine-tuning. Training RoBERTa efficiently under GPU memory constraints.
Accomplishments that we're proud of
Built a complete, scalable pipeline from raw text -> clean dataset -> fine-tuned transformer -> evaluation.
Successfully fine-tuned RoBERTa to detect low-quality reviews with strong performance.
Manually labeled training and test data, giving us a reliable ground truth to evaluate performance and avoid overfitting.
Produced results that are directly applicable to real-world review platforms.
What we learned
How to design and implement a review quality classifier end to end. The importance of thoughtful preprocessing, especially for noisy, user-generated text. Practical experience fine-tuning transformer models with Hugging Face and PyTorch. The trade-offs between model accuracy, dataset size, and computational efficiency.
Built With
- hugging-face
- matplotlib
- numpy
- pandas
- pytorch
- scikit-learn
- seaborn
Log in or sign up for Devpost to join the conversation.