I do a lot of reading on medium, but sometimes don't feel that the article answers all my questions and/or can't find highly related articles to the one I am reading. This led me to want to develop a website that can simply take the url for an article on medium, and generate high level analytics for it.
What it does
This project takes in the url for an article on medium, and returns n highly related medium articles. Furthermore, it finds additional high level insights and analytics. It is able to find the n most important sentences, and named entities, and their frequencies. Additionally, it extracts the attention output of the model's embeddings, and uses it to generate visuals on word relationships within the article.
How we built it
I was able to build this project using state of the art nlp methods such as transformers, as well as older techniques, like NER with spacy, and nltk. To gain strong results with the transformer, I pretrained the model on only my dataset text, so it could achieve maximum performance.
Challenges we ran into
There were several challenges I ran into in the creation of this project. For one, using an out of the box model from huggingface led to subpar performance due to it's lack of understanding for article specific text. Pretraining the model on my own datasets text allowed it to achieve better performance. Other challenges I faced dealt with flask as I still have limited experience with it. However, with the help of tutorials, and articles, I was able to better understand it.
Accomplishments that we're proud of
I am extremely happy that I was able to build a website with multiple pages, and greater dynamic interaction.
What we learned
I am extremely happy that I was able to learn more about flask, and also improve my understanding of transformer visualization.
What's next for In-Depth Article Analysis And Recommendations
I hope to add a database for the articles so that it can contain newly created articles. I am currently only using a pre-existing dataset. I also hope to continue improving the overall design and interface of the website.