Inspiration
We were inspired by the growing issue of unreliable online reviews. Many platforms are filled with spam, irrelevant rants, or low-effort posts that mislead users and dilute genuine feedback. We wanted to tackle this problem by building a tool that could automatically filter and classify reviews, making review platforms more trustworthy and user-friendly.
What it does
The Review Category Classifier is a web app that classifies reviews into categories such as genuine reviews, spam/ads, low-quality entries, and rants without a visit. It combines both machine learning models and rule-based heuristics to ensure accurate and reliable predictions, while offering a clean and interactive interface for users to test the system.
How we built it
We started with raw review datasets and performed extensive cleaning and preprocessing to remove duplicates, blanks, and noise. We then trained multiple models, including a TF–IDF + Logistic Regression baseline and a fine-tuned DistilBERT model. To improve performance, we developed an ensemble that combines these models with rule-based classification. The backend was implemented with FastAPI to serve predictions, while the React frontend provides a seamless and user-friendly interface. Evaluation was done using metrics such as accuracy, precision, recall, and F1-score:
Challenges we ran into
- Handling imbalanced data across categories.
- Designing an ensemble that could combine rules and ML predictions consistently.
- Training large models like DistilBERT with limited compute resources.
- Integrating the backend with the frontend to ensure smooth, real-time predictions.
Accomplishments that we're proud of
- Successfully building a working full-stack application that integrates ML, rules, and a web interface.
- Achieving strong performance through ensemble modeling.
- Creating a system that is both technically sound and practical for real-world use.
- Collaborating effectively as a team and aligning contributions across AI, UI, and testing.
What we learned
We learned how to manage real-world datasets, design and fine-tune machine learning models, and combine them with rule-based approaches for improved robustness. We also gained hands-on experience in building an end-to-end pipeline, from data preprocessing and training to deployment with FastAPI and frontend integration with React. On the teamwork side, we learned to manage roles effectively and solve integration challenges collaboratively.
What's next for Syntax Squad
Our next steps are to expand the classifier by incorporating more advanced large language models (LLMs) for better contextual understanding, scale the system to larger and more diverse datasets, and improve explainability so users can see why a review was classified a certain way. We also plan to optimize deployment for production readiness, making the system more efficient, reliable, and ready for integration into real platforms.
Log in or sign up for Devpost to join the conversation.