We perused some of the suggested NLP data sets and the Amazon reviews stood out for their size and cleanliness. We found a few papers and blog posts on article and comment automatic summarization.
What it does
tl;dr is a web app which generates good summaries for long Amazon (or other e-commerce) reviews. When enabled, it seamlessly replaces bad/short summaries of long reviews on Amazon product pages with machine-generate summaries, in the form of keyword lists or key sentences.
How we built it
NLP: Python, using nltk and pandas
Challenges we ran into
Word embeddings vs sentence embeddings: Generating coherent sentences is difficult with a model built from word-level similarities, and sentence-level similarities aren't good for summarizing concise on ungrammatical reviews. Therefore, we're developing both approaches simultaneously, with the plan of either allowing using toggling between keyword and full-sentence summaries, or using a heuristic to decide automatically.
Accomplishments that we're proud of
We coded an NLP algorithm to extract important information from an Amazon review.
What we learned
We learnt NLP algorithms and how to implemented them.