Inspiration

Online reviews shape where we eat, what we buy, and how we travel — yet a large portion are fake, promotional spam, or irrelevant noise. I noticed that many existing filtering systems are rule-based, which makes them brittle against increasingly sophisticated manipulation tactics. This inspired me to build RealViews, an ML-powered system that can reliably detect fake reviews, ensuring consumers and businesses can trust the information they rely on.

What it does

RealViews automatically detects and filters policy-violating reviews such as fake content, advertisements, spam, and irrelevant text. It provides each review with a quality score (0–1), generates explainable AI outputs with confidence levels, and supports multilingual analysis (English and Chinese) in real time (<100ms per review). The system also analyzes user metadata and temporal patterns to flag coordinated fake review campaigns.

How I built it

I collected and processed ~4,772 labeled reviews across English and Chinese, extracted 25+ linguistic, sentiment, and behavioral features, and trained ensemble classifiers (logistic regression, random forest, gradient boosting) with F1 scores above 0.86. I then deployed the system as a Streamlit web app, integrating Google Translate API for cross-language support and Plotly dashboards for interactive visualizations of suspicious review clusters.

Challenges I ran into

The biggest hurdles were limited datasets (especially in Chinese), manual labeling that was both time-consuming and inconsistent, computational constraints that made large transformer training infeasible, and the time pressure of building a working prototype. Despite these challenges, RealViews demonstrated strong accuracy, scalability, and real-time performance.

Accomplishments that I am proud of

  1. Achieved 86%+ F1 score across multiple violation categories.

  2. Designed an explainable AI interface so predictions aren’t just black-box outputs.

  3. Proved that lightweight ML can scale without requiring massive infrastructure.

What I learned

I learned how to effectively combine NLP, metadata analysis, and ensemble ML into one system. I also deepened my understanding of cross-lingual NLP challenges, particularly around Chinese tokenization and translation pipelines. Most importantly, I learned how crucial explainability and user trust are when deploying AI for real-world decision-making.

What's next for RealViews

In the future, RealViews can be scaled with larger multilingual datasets (Spanish, Arabic, French, etc.) and improved through reinforcement learning with user feedback loops. I also plan to integrate LLM-based reasoning for deeper context analysis and expand the system into commercial APIs that platforms can use directly to safeguard their review ecosystems.

Built With

+ 28 more
Share this project:

Updates