Masters of Puppets

Visualization of corpus
LDA example
Bar chart

Inspiration

SGS challenge

What it does

Summarize documents using graph based textrank.
Unsupervised clustering of documents into topics using Latent Dirichlet Allocation(LDA)
Supervised classification of documents into categories like news, guidance etc. using state-of-the-art deep learning techniques
Generation of abstracted information from documents

How we built it

We used Natural Language Tool Kit (NLTK), Gensim, Tensorflow, Keras, Long Short Term Memory(LSTM) to solve the challenge.

Challenges we ran into

Reading research papers related to LDA and Textrank to get general idea about them, learning Tableau,

Accomplishments that we're proud of

The variety of analysis we were able to do in a short time starting from supervised classification to unsupervised clustering and summarization with reasonable fidelity.

What we learned

Gensim, NLTK, Tableau, Textrank, LDA approach

What's next for Master of Puppets

Fine-tuning the models to improve accuracy. Also summarization of news article compared to a regulation article can be different because of the nature of those types. This is a topic for improvement.