Inspiration:

Reviews are found to be the essential information during consumer's decision making process. Normally, people search specification details of electronic products before buying them.

With the rise of e-marketing and e-commerce, the transaction is often done online, at online shopping sites and also suppliers official websites.

With these two in mind, we want to investigate the way to minimize our effort to buy the product we want according to online reviews, at the best online store the netizens recommend.

To narrow down the scope of our project, we had implemented a web app that summarizes the online reviews of Laptop and Mobile Phones (electronic gadgets), and the reviews of purchase experience in the chosen online market (Shopee, Lazada, Amazon) using Natural Language Processing (NLP) techniques.

What it does:

sift.cpr scraps the reviews for the products and retailers, process the data and gives the summarized reviews to the users.

High level idea of what happens In the backend, when a user puts in a query:

Additionally, to enhance the buying experience, there are additional features implemented:

  1. Show of relevant video reviews from Youtube and summaries of the videos through NLP techniques.
  2. Show of star ratings of the product
  3. Summarization of reviews that are in the form of feature (e.g. for a laptop, its feature can be lightweight) - we put them into label tags, to enable users to search by tag (i.e. users can search for all products with the tag ‘lightweight’)
  4. Redirection to the shopping sites upon clicking on the review

How we built it:

Scrapping/ Retrieving of review from different websites

Selenium(automated browsers) to scrap datas from websites we cannot obtain their API call. The data needs to be stored in fields where we need to go into the HTML DOM to get it.

For the websites we can observe their API call, we obtain the JSON string response from the call and get the data needed in the respective fields.

Summarizing of reviews

Text Summarization

PageRank is used when we search something in the browser. The page that is interrelated with most of the pages, which means the page that can represent the knowledge of most of the pages.

Treating reviews as passage with multiple sentences, TextRank (similar to PageRank) is to investigate cosine_similarities between sentences and find the sentence with the largest similarities with others.

Part-of-speech Tagging/Dependency Parsing

POS tagging , which is grammatical tagging, tags the word with its definition and context, e.g. NOUN ADJ will be tagged to each words. Dependency parsing is the task of recognizing a sentence and assigning a syntactic structure to it. Example of dependency parsing: - Text Processing - Translation of text using TextBlob (provided by Google Translate) - Named Entity Extraction using SpaCy

Results returned to UI

We had build a REST API using Python and Flask that enables the frontend to obtain the required response from a simple HTTP GET method. This separates the frontend and backend and hence offers modularity where whenever there is a change on any side, only the middleman need to be modified but not the logic of the opposite side.

Challenges we ran into:

The language processing part is the most difficult part during this hackathon. In order to summarize the product reviews by using deep learning, we require products’ review and reviews’ summary to train the deep learning model. However, there are only few related datasets from Amazon, DailyMail and CNN. Therefore, we use Machine Learning techniques to extract the most adjacent and authentic review of the product. These techniques are supported by Spacy, nltk, networkx and etc.

Accomplishments that I’m proud of:

Workable sift.cpr is implemented from scratch during this 24 hours. Our backend handles RESTful API, therefore it is modular and we can connect our frontend and backend seamlessly. The backend is running on top of Flask. So, we can run it everywhere, everytime.

What's next for sift.cpr App:

Can be further extended to summarize product review using fine-tuned deep learning model. Able to redirect to the chosen online market as this product has better reviews in the specific market due to cheaper price or other factors.

Built With

Share this project:

Updates