Inspiration
As someone who often reads articles online, it can take a bit of time to know if an article is covering what you are interested in. This extension will help to get the gist of an article, and which sections are worthwhile reading.
What it does
Finds articles on your current webpage and creates and index (table of contents) above it. It does this by breaking the article up into sections, and using the LLM to summarize each section, providing 3 key ideas of that section. The user may jump to any section in the source (article), and jump back up to the index. In the options, the user may specify a whitelist of websites which this plugin will apply to.
How we built it
As a first step, I tested the Nano LLM, and concluded summarization is a key strength. The plugin was tested was against various websites, such as news websites with multiple articles (e.g. slashdot), and ones which load dynamic content (e.g. steam), and various blog-like websites.
Challenges we ran into
The LLM was asked to provide JSON output, but often provided invalid JSON - code was added to fix common errors such as missing commas, or double quotes. Additionally the LLM may fail to process the prompt, re-try logic was added. Creating the section links was challenging - the LLM is processing the textContent, though we need to create sections links against the HTML content. I did not want to alter the article's HTML - rather I used the existing tags in the HTML, and added ids to them.
Accomplishments that we're proud of
I am proud to have overcome the above challenges; overall happy with how the plugin turned out -useful and as planned.
What we learned
I learned about the difficulties of processing the DOM. Also learned more about how to write robust code/fail gracefully.
What's next for article-idx
Features such as:
- Generate the index on user interaction only, e.g. they click a link above the article to generate the index
- Faster initial generation - let the first section be smaller, currently the first load take a while at 4500 chars.
- Allow the user to interact with the article - "Take me to the section which talks about X".
- Caching (viewing the same article instantly loads the index)
Log in or sign up for Devpost to join the conversation.