We perused some of the suggested NLP data sets and the Amazon reviews stood out for their size and cleanliness. We found a few papers and blog posts on article and comment automatic summarization.

What it does

tl;dr is a web app which generates good summaries for long Amazon (or other e-commerce) reviews. When enabled, it seamlessly replaces bad/short summaries of long reviews on Amazon product pages with machine-generate summaries, in the form of keyword lists or key sentences.

How we built it

NLP: Python, using nltk and pandas

Challenges we ran into

Word embeddings vs sentence embeddings: Generating coherent sentences is difficult with a model built from word-level similarities, and sentence-level similarities aren't good for summarizing concise on ungrammatical reviews. Therefore, we're developing both approaches simultaneously, with the plan of either allowing using toggling between keyword and full-sentence summaries, or using a heuristic to decide automatically.

Accomplishments that we're proud of

We coded an NLP algorithm to extract important information from an Amazon review.

What we learned

We learnt NLP algorithms and how to implemented them.

What's next for tl;dr

Built With

Share this project: