What it does
Summarize documents using graph based textrank.
Unsupervised clustering of documents into topics using Latent Dirichlet Allocation(LDA)
Supervised classification of documents into categories like news, guidance etc. using state-of-the-art deep learning techniques
Generation of abstracted information from documents
How we built it
We used Natural Language Tool Kit (NLTK), Gensim, Tensorflow, Keras, Long Short Term Memory(LSTM) to solve the challenge.
Challenges we ran into
Reading research papers related to LDA and Textrank to get general idea about them, learning Tableau,
Accomplishments that we're proud of
The variety of analysis we were able to do in a short time starting from supervised classification to unsupervised clustering and summarization with reasonable fidelity.
What we learned
Gensim, NLTK, Tableau, Textrank, LDA approach
What's next for Master of Puppets
Fine-tuning the models to improve accuracy. Also summarization of news article compared to a regulation article can be different because of the nature of those types. This is a topic for improvement.