tl;dr

Amazon review

Inspiration

We perused some of the suggested NLP data sets and the Amazon reviews stood out for their size and cleanliness. We found a few papers and blog posts on article and comment automatic summarization.

What it does

tl;dr is a web app which generates good summaries for long Amazon (or other e-commerce) reviews. When enabled, it seamlessly replaces bad/short summaries of long reviews on Amazon product pages with machine-generate summaries, in the form of keyword lists or key sentences.

How we built it

NLP: Python, using nltk and pandas

Challenges we ran into

Word embeddings vs sentence embeddings: Generating coherent sentences is difficult with a model built from word-level similarities, and sentence-level similarities aren't good for summarizing concise on ungrammatical reviews. Therefore, we're developing both approaches simultaneously, with the plan of either allowing using toggling between keyword and full-sentence summaries, or using a heuristic to decide automatically.