What It Does

Kaypoh Aunty is a web application that transforms the experience of reading online reviews. It functions by:

  1. Scraping Data: It scrapes Google reviews in real-time.
  2. AI-Powered Analysis: Each review is then processed by a custom-trained machine learning model.
  3. Intelligent Categorization: The model automatically classifies the content into five distinct categories:
    • Useful Reviews
    • Advertisements
    • Spam
    • Rants Without Visits
    • Irrelevant Content

The result is a clean, filterable interface that allows users to sift through the noise and focus on the reviews that truly matter.

How It Was Built

The application was constructed using a modern, API-driven architecture.

  • Frontend: The user interface was built with HTML, CSS, and JavaScript, creating a lightweight and responsive experience.
  • Data Scraping: We integrated the Apify API to handle the dynamic scraping of Google reviews.
  • AI & Classification: The core of the project is a custom DistilBERT model. This model is hosted on a Hugging Face Space and is served via a dedicated Inference API. Our application's backend logic calls this API to classify reviews on the fly.

Accomplishments We're Proud Of

  • High Classification Accuracy: The model achieved impressive accuracy, particularly in distinguishing clear-cut categories like "Spam" and "Advertisements" from genuine user feedback. This success is the foundation of the application's value.
  • Seamless API Integration: We successfully orchestrated multiple services, creating a smooth data pipeline from the Apify scraper to our custom Hugging Face Inference API and back to the frontend.
  • Effective User Experience: We translated a complex backend process (scraping and AI analysis) into a simple, intuitive, and genuinely useful tool for the end-user.

What's Next for Kaypoh Aunty

Our roadmap for future development is focused on enhancing the model's intelligence and expanding the application's features.

  • Enhanced Model with Metadata: The next version of the model will be retrained on a richer dataset. We plan to incorporate metadata—such as the star rating, review length, and post frequency—as additional input features. We hypothesize this will improve accuracy for more nuanced classifications.
  • Sentiment Analysis: We intend to add a sentiment analysis layer (Positive, Neutral, Negative) to provide users with another powerful filtering dimension.
  • Broader Platform Support: We plan to expand the scraper's capabilities to analyze reviews from other platforms beyond just Google.

Built With

Share this project:

Updates