Inspiration
Opening a URL in a new tab is an act of hope. "Oh, it's a great article! I will definitely read it later. Now I have this to deal with".
But when the time comes hours, weeks or months later, and you want to return to that article in the sea of open tabs, it's nowhere to be found! If only AI could help... 🤔
What it does
The Oracle Of Thabes does the following:
- It queries the list of your open tabs
- Summarize the contents of each one using Gemini Nano built into new Chrome versions
- Presents summaries in the sidebar
You can:
- Search those summaries to find the tab you are looking for
- You can also ask the AI model a question and get the answer from each of the summaries.
How I built it
For the most approachable codebase, the extension is built in vanilla JavaScript, without a build process or external dependencies. I used following APIs and methods:
chrome.tabs.queryto get the list of open tabschrome.scripting.executeScriptto retrieve the content of the open tabai.summarizer.summarizeStreamingto generate a streaming summary of the open tabchrome.storage.syncto cache generated summariesai.languageModel.promptto ask and generate the answer based on individual summaries
Summarization process
Small language models, including Gemini Nano have a bit limited context windows. Meanwhile, a lot of articles/tabs can exceed these context windows quite a lot. To get around that problem, a chunking strategy is used to split the article into 4000 char pieces (around 1000 tokens) and summarized in pieces - each piece in new paragraph. The whole process looks like so:
- Create a summarizer with sharedContext containing URL and title of the tab
- Split content into 4000 char chunks
- Call
summarizer.summarizeStreamingin a chain of promises, streaming summaries of each chunk in series - If there was more than 5 chunks or longer, it means that our summary is already pretty long. If that is the case - we feed the summaries to create a "summary of summaries"
Retrieval Augmented Generation process
Implementing basic search/filter through summaries is pretty straightforward, but if the keywords don't match exactly, we will not find the exact tab we want.
So I came up with the following RAG technique:
- When users ask the question, we loop through each of the summaries
- We feed the summary into
ai.languageModeland ask the model if the summary is relevant to the question. - If it is, we ask the model to answer the question based on the summary provided.
Challenges we ran into
Retrieving content of the tab
The average website has quite a few navigational elements, ads, etc. Retrieving the actual content is tricky. For now, I used common CSS selectors with fallbacks to find the text content we are interested in. Some libraries can probably do this better, but I was reluctant to introduce external dependencies into this challenge. The additional problem is with listing pages, like the front page of Hacker News, New York Times, etc., containing a list of articles. We probably should loop through all the listed elements in those cases, but the diversity of different listing pages is too big for this team.
No local embeddings
There are APIs to summarize and generate, but there are no Gemini Nano APIs for creating embeddings needed for semantic search.
We go around this issue by using ai.languageModel to read the summary and answer YES/NO to indicate if the summary is relevant to the current question.
Model Context window
As mentioned, with longer content ai.summarizer returns less useful summaries. That's why we split the content into chunks and summarize in isolation.
Parallel work
On some chrome versions, the models "lock up" if too many summaries are generated at once. That is why we put every piece of work in promises chain in order to generate summaries serially.
What's next for The Oracle of Thabes
The Oracle of Thabes has already been submitted to the Chrome Web Store and is pending review.
Possible improvements include:
- Using smarter logic or an external library to retrieve the content of the current tab
- Using smarter chunking logic or an external library to better chunk content into pieces
- When the user asks a question, use summaries to filter the tabs, but later use the entire contents of the tab to actually answer the question
- Introduce a background process to flag websites as "potentially interesting" / "potentially not useful" according to a preset interest list
Log in or sign up for Devpost to join the conversation.