Digging into a product's reviews can be an overwhelming process. Many of our product reviews are novella length tomes, causing a negative customer experience. We wanted to produce a simple summary of what our customers are saying about each product. Text Summarization has made some major machine learning break-throughs in the last 2 years, allowing us to apply this technology to Wayfair's site.
What it does
This looks at the most helpful reviews for each product and builds two separate summaries: one based on positive reviews and one based on negative reviews.
How we built it
- We pull product reviews from Sql DB, filtering down to the helpful reviews.
- Once we've pulled in the product reviews, we perform text processing on them, to get them into a format that is readable by our neural net.
- We then train the model to generate short summaries from the review text
- We then separately filter the most helpful positive (> 3 stars) and most helpful negative (< 3 stars) product reviews through the neural net. This produces TLDR versions of our positive and negative customer reviews.
- These abstracts are then fed into MSSQL.
- The data is then serialized and sent client-side, where we render it as a new React component within the Product Reviews section on PDP.
Challenges we ran into
The biggest challenge was getting the neural net trained. We used a pre-designed tensorflow model from here: https://github.com/dongjun-Lee/text-summarization-tensorflow. Formatted dataset as per this repo https://github.com/Currie32/Text-Summarization-with-Amazon-Reviews.
Accomplishments that we're proud of
V0: We tried training the model on a subset of the gigaword corpus but this immediately resulted in us filling up the memory of one of our production GPUs. Many slack messages with systems engineering later, we decided to run on BigDataTOP. The issue here was that TOP's memory is full, which prevented us from installing tensorflow. Finally, we ended up running on a local setup... it's still training.
V1: We trained it on a small (10k Wayfair reviews) which produced summaries like this:
Review: "I was looking for a narrow, black console for my entry and was nervous about purchasing online. I took a chance on this one and am so glad I did. This thing is heavy and solid! The description and seller photos do not do it justice."
Summary:"perfect console piece"
V2: We tuned some hyperparameters and trained it on a larger data set (60k reviews) and the results were impressive:
Review:"I've attached the picture from the website vs. the area rug I received- this is not what I expected -it is dark and gloomy. Too much of a pain to return, will now donate it to charity."
V1 Summary:"love love this bird"
V2 Summary:"actual picture does not do it justice"
Isn't that impressive?
What we learned
If you want a neural net trained by Sunday, you should probably kick it off before Saturday.
What's next for Summarizing Customer Product Reviews
- There is tons of optimization that can be done to the summarization neural net. in V2 Simply increasing the training size and tuning some hyperparameters yielded a better model.
- Create separate summaries for reviews about different product aspects. For example: We can create positive/negative summaries for the product itself, Wayfair's delivery experience, buying experience or customer service provide even more insight from TLDR reviews