Inspiration

I didn't feel like reading long articles and wanted a quick way to view the summary of it, but the best I could find was some website that you had to copy paste text into, and it would summarize it. I wanted a quick and easy way to view the summary of websites, so I proposed the idea that we build a chorme extension that you can just click when you're in an article to view a summary of it.

What it does

You can install this extenension on any chromium based browser(like Google Chrome). Once installed, you can go to any article and click on the extension. It will open up a small window in which there will be a summary of the article.

How we built it

The first thing we had to figure out was how to get a raw text of the article from the HTML. For this, we found a very useful open source library called Mercury parser, which extracted text from any HTML we provided to it. The source code of Mercury parser can be accessed at https://github.com/postlight/mercury-parser. Once that was done, we started building the frontend, and also started working on getting the actual summarized working. For the frontend, we went with a chrome extension, as we thought it was the best medium for this type of application. In the chrome extension, we used HTML, NodeJS, and CSS along with WebPack to build a neat, minimalistic chrome extension that can read the URL and HTML of the page the user is currently in. The main chrome extension window itself shows the summary of an article to the user. The next thing we had to figure out, like mentioned before, was how to generate the summary itself. For this, we decided to use pretrained neural networks. After experimenting with many different neural network architectures, we settled on the BART transformer architecture which was pretrained by Facebook on 1 GB dataset of CNN and DailyMail news articles. We ran the model with the PyTorch library and served it with the Flask web framework.

Challenges we ran into

The first challenge we encountered was extracting text from HTML. We first looked at outline.com to see how they accomplished the task, but none of it was open source, so we decided to use an open source implementation called Mercury instead. The biggest challenge we faced was shipping the machine learning model along with the extension. At first, we wanted the extension to work offline, but we ran into roadblock after roadblock in attempting to export the model in order to be run in javascript. Eventually, we just decided to create a backend server instead, which would receive text and output summarized text in return.

Accomplishments that we're proud of

We were essentially able to get a computer to read an article and write a summary for it, all by itself, which is a pretty amazing feat to achieve. We are also very proud of our design and medium for this application, and how we made it very easy for a user to use the extension.

What we learned

We learned a lot about deploying neural networks with PyTorch and creating browser extensions.

What's next for Quick Summary

This is a relatively simple app, but there is still scope for expansion. First things first, we want to create an offline version that uses a smaller model(the current one takes up around 1-2 GB of RAM) and can run on a user's computer. This would make it very useful for people who lose connection often, such as people in a moving vehicle. We also want to make the online version faster, by using techniques such as cacheing so that it would not have to generate summaries for the same article repeatedly. Additionally, we would like to make an extension for non-chromium browsers such as Firefox.

Share this project:

Updates