Inspiration

I was inspired by the perplexity chrome extension, and I knew with the tools that google provided, i can build a more secure conscious option. I was thinking of how elderly people, and other people who barely know how to use chrome. it can be a powerful tool, especially with a text-to-speech functionality. it can act as a universal chatbot to webpages, and the user can make web search queries on it.

What it does

It is an agentic chrome extension that allows you to interact with the web pages, ask question about any website, it can summarize YouTube Videos, it can act as a page translator. Think of of it as an AI mode integrated to any webpage, it is called on. Main functionalities: .- Answer questions relating to any webpage

  • Interact with web pages
  • AI powered search deep and simple -Summarize content, and YouTube videos
  • Supports speech-to-text and image upload
  • Page translation and multi-lingual support

How I built it

AGENT NAN has a very complex architecture. it consist of mainly the MCP server, and the client. Then mainly two models, one for picking and deciding which tool to use, and the other for answering questions. The implementation of this was with Gemini Nano Prompt API being a secure and optimal solution for a local and lightweight ai extension.

Challenges and Breakthroughs

The challenges i ran into while building this project:

  • Limited context window for local LLMs, I handle this by creating a new instance for every tab and URL visit
  • Handling errors and updates from two separate models while running on the same instance

Impact and Future

The impact of this project is developing a local based agentic extension, that solves the major security concerns of most people when it comes to this issue. showing that a local based approach might be the future to agentic ai tools

Accomplishment that I'm proud of:

Managing to develop an agentic system that run locally in the browser, and doing it all in one month

What we learned

I learnt a lot about the chrome API, and how it can be used to solve a variety of different problems

What's next

I am looking towards including more tools to the mcp-server e.g. booking meetings, flights etc. That will make the chrome interface feel more friendly to all users. Adding a voice-to-voice model like siri, where the user can receive a voice output without checking the sidepanel

Share this project:

Updates