Inspiration
The motivation behind the project was to create an automated system that filters low-quality and irrelevant reviews to improve user trust, provide businesses with accurate feedback, and reduce the burden on platforms to moderate content.
What it does
Review Quality Evaluation: The system evaluates reviews based on quality, categorizing them into labels like spam, irrelevant, advertisement, rant, and legitimate. Classification Pipeline: Using machine learning, the system flags spam, irrelevant content, and inappropriate language in reviews. Automatic Moderation: Reviews are flagged if they contain URLs, phone numbers, emails, or spam keywords, or if they exhibit behaviors like rants or fake sentiments.
How we built it
Data Collection: The system pulls data from Kaggle (denizbilginn/google-maps-restaurant-reviews) for training, ensuring the input is representative of real-world data. Feature Extraction: Various text processing techniques like TF-IDF and sentence embeddings (using the SentenceTransformer) are used to convert review text into meaningful features for classification. Model Building: The system offers multiple machine learning models (e.g., Logistic Regression and Random Forest) to classify the reviews. The model is trained and evaluated using scikit-learn.
Challenges we ran into
Detecting nuanced behaviour like sarcasm, fake reviews, and subtle irrelevant content was challenging. Another challenge was processing all the various columns from the reviews.csv file.
Accomplishments that we're proud of
For many of us, this was our first hackathon, and we're proud to have completed the project despite being on a tight timeline and having limited prior experience.
What we learned
We started with little knowledge of machine learning, data collection, and cleaning. However, through this project, we gained valuable skills and insights that we didn't have before.
Log in or sign up for Devpost to join the conversation.