Inspiration
Businesses and fellow reviewers often face misunderstanding and mistrust due to wrongful reviews.
This makes it difficult to extract valuable insights and can even damage reputations. Hence we decided to target Problem 1: Filtering the Noise: ML for Trustworthy Location Reviews.
What it does
Our model filters out untrustworthy or low-value reviews, ensuring that the more factual and meaningful ones remain.
This helps both businesses and users by highlighting reviews that truly reflect the real experience.
How we built it
We combined a traditional classifier (TF-IDF) with large language models (LLMs) to generate labels for policy violations.
First, we brainstormed different violation categories — from sentiment inconsistencies between text and rating, to irrelevant content mismatches.
A key challenge was that existing labels were often unreliable. Initially, we considered manually labeling the dataset, but it quickly became infeasible.
To address imbalance across classes, we applied SMOTE (Synthetic Minority Oversampling Technique).
For feature extraction, we experimented with Count Vectorization alongside our classifiers.
We built the project primarily in Google Colab, with additional work in VS Code.
Our stack included libraries such as:
- nltk (text preprocessing)
- alt-profanity-check (profanity detection)
- scikit-learn (ML models, TF-IDF, evaluation)
- numpy & pandas (data handling and analysis)
For training and evaluation, we used two main datasets:
Challenges we ran into
- Limited access to LLM APIs: we had to rely on ChatGPT directly, which restricted how many reviews we could truthfully label.
- Label sparsity: categories like Advertisement had very few examples, so we removed them to focus on the stronger signals.
- Data imbalance: required careful preprocessing and oversampling.
Accomplishments that we're proud of
- Completing this hackathon project as a solo participant.
- Gaining hands-on experience with SMOTE and Count Vectorization.
- Building a pipeline that can meaningfully improve trustworthiness in review datasets.
What we learned
- The importance of clean, reliable labels in supervised ML tasks.
- Practical challenges of applying LLMs for labeling when resources are limited.
- How combining classical ML techniques with LLM inference can be more powerful than either alone.
- Using Colab and open datasets efficiently to iterate quickly in a hackathon setting.
What's next for Jet2Holiday
We aim to:
- Improve accuracy with larger, better-labeled datasets.
- Scale data collection by building scrapers to gather more diverse reviews.
- Gain access to LLM APIs, enabling us to label at scale and train more robust models.
Mathematically, the performance improvements we aim for can be framed as maximizing the accuracy function:
[ \text{Accuracy} = \frac{\text{True Positives} + \text{True Negatives}}{\text{Total Samples}} ]
with the goal of consistently improving this metric as we iterate on the model.
Log in or sign up for Devpost to join the conversation.