Interactivity is essential on the modern web, but without manageable content we just have a bunch of pretty garbage. Blogs, webcomics, and news sites are all static, essential sources of information. There is still room for improvement.

These static sites do not scale well with a large amounts of content. Discovering a blog with thousands of posts is not uncommon, and it is nearly impossible to read even the highlights of said blog in a single sitting. We need to break these huge archives into manageable chunks.

It would be nice if we could "rewind" the RSS feed to the beginning and feed the content as if it were being published in the present. Unfortunately, such an operation is not supported by RSS protocol. Instead, our code works around this deficiency by data mining a site for its archive and then feeding the articles to the user one at a time via an RSS feed. This breaks down an unassailable heap of content to an efficient stream of information.

The most technically challenging and rewarding part of our project was trying to find the archive links of arbitrary static content. We started off using heuristics, such as gathering those sections with a link and a date, and moved on to using machine learning to classify the links.

Built With

Share this project:
×

Updates