The Oracle of Thabes

Inspiration

Opening a URL in a new tab is an act of hope. "Oh, it's a great article! I will definitely read it later. Now I have this to deal with". But when the time comes ~~hours, weeks~~ or months later, and you want to return to that article in the sea of open tabs, it's nowhere to be found! If only AI could help... 🤔

What it does

The Oracle Of Thabes does the following:

It queries the list of your open tabs
Summarize the contents of each one using Gemini Nano built into new Chrome versions
Presents summaries in the sidebar

You can:

Search those summaries to find the tab you are looking for
You can also ask the AI model a question and get the answer from each of the summaries.

How I built it

For the most approachable codebase, the extension is built in vanilla JavaScript, without a build process or external dependencies. I used following APIs and methods:

chrome.tabs.query to get the list of open tabs
chrome.scripting.executeScript to retrieve the content of the open tab
ai.summarizer.summarizeStreaming to generate a streaming summary of the open tab
chrome.storage.sync to cache generated summaries
ai.languageModel.prompt to ask and generate the answer based on individual summaries

Summarization process

Small language models, including Gemini Nano have a bit limited context windows. Meanwhile, a lot of articles/tabs can exceed these context windows quite a lot. To get around that problem, a chunking strategy is used to split the article into 4000 char pieces (around 1000 tokens) and summarized in pieces - each piece in new paragraph. The whole process looks like so:

Create a summarizer with sharedContext containing URL and title of the tab
Split content into 4000 char chunks
Call summarizer.summarizeStreaming in a chain of promises, streaming summaries of each chunk in series
If there was more than 5 chunks or longer, it means that our summary is already pretty long. If that is the case - we feed the summaries to create a "summary of summaries"

Retrieval Augmented Generation process

Implementing basic search/filter through summaries is pretty straightforward, but if the keywords don't match exactly, we will not find the exact tab we want.

So I came up with the following RAG technique:

When users ask the question, we loop through each of the summaries
We feed the summary into ai.languageModel and ask the model if the summary is relevant to the question.
If it is, we ask the model to answer the question based on the summary provided.

Challenges we ran into

Retrieving content of the tab

The average website has quite a few navigational elements, ads, etc. Retrieving the actual content is tricky. For now, I used common CSS selectors with fallbacks to find the text content we are interested in. Some libraries can probably do this better, but I was reluctant to introduce external dependencies into this challenge. The additional problem is with listing pages, like the front page of Hacker News, New York Times, etc., containing a list of articles. We probably should loop through all the listed elements in those cases, but the diversity of different listing pages is too big for this team.

No local embeddings

There are APIs to summarize and generate, but there are no Gemini Nano APIs for creating embeddings needed for semantic search. We go around this issue by using ai.languageModel to read the summary and answer YES/NO to indicate if the summary is relevant to the current question.

Model Context window

As mentioned, with longer content ai.summarizer returns less useful summaries. That's why we split the content into chunks and summarize in isolation.

Parallel work

On some chrome versions, the models "lock up" if too many summaries are generated at once. That is why we put every piece of work in promises chain in order to generate summaries serially.

What's next for The Oracle of Thabes

The Oracle of Thabes has already been submitted to the Chrome Web Store and is pending review.

Possible improvements include:

Using smarter logic or an external library to retrieve the content of the current tab
Using smarter chunking logic or an external library to better chunk content into pieces
When the user asks a question, use summaries to filter the tabs, but later use the entire contents of the tab to actually answer the question
Introduce a background process to flag websites as "potentially interesting" / "potentially not useful" according to a preset interest list

Built With

Updates

Artur Piszek started this project — Dec 03, 2024 06:33 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.