Inspiration

While there have been tons of integrations of ai tools into different specific platforms the browsers remain stuck in the past. We wanted to bring the same experience as we have in modern code editors to the browser and have smart auto complete and a in line assistant for any text field.

What it does

It monitors your dom, embeds relevant parts of it into a vector db and retrieves it in the blink of an eye, when you autocompletes. It also keeps track of your latest actions, so that it can detect repetitive tasks and also prevent recommending the same things twice. The inline assistant also has support for images.

How we built it

A browser extension runs locally in the user’s browser and monitors the content currently being viewed. Through this extension, we can also retrieve the cursor position and the text style of the text field the user is currently interacting with, along with the other information required to position a ghost text suggestion.

The extension intelligently tracks the viewport and only sends portions of the website that have not already been processed. These segments are then received by the backend, where they are chunked and stored in Qdrant using Gemini embeddings.

By leveraging Gemini’s different embedding task types, we can perform retrieval queries using the text the user has already written in the field, combined with contextual information about the webpage and the user’s most recent actions. The most relevant chunks are then passed to Gemini Flash 2.5 Lite, which generates a recommendation that is sent back to the browser.

The inline assistant works in a similar fashion, except that it also retains context from the user’s prompt as well as any images that are provided.

Challenges we ran into

It was quite challenging to build a ghost mechanism that works with the most amount of text fields. We spent quite some time trying to built an integration for google docs for example, but working with canvas was quite a challenge but the strict script injection protection is what ultimately drowned the integration attempt.

Accomplishments that we're proud of

We were really excited when we first connected all the components and the system actually started generating recommendations based on context. Seeing the full pipeline work end-to-end was a big milestone for us.

The speed of the retrieval and generation processes especially surprised us. Even though we still had a lot of work ahead of us, particularly in improving the quality of the recommendations, deciding which context to pass, and refining the prompts, it was incredibly motivating to see how fast and responsive the system already was.

What we learned

Most rag systems we built in the past projects used local running small embedding models that only found close text without actually being able to prompt the retrieval. This was the first time working with such a capable embedding model that is also multi modal so it was a new experience.

What's next for Mouse

Right now its a well working prototype, but there are still security and privacy considerations to make. Currently we solved the privacy issue with the user having a shortcut to toggle the extension viewing them or not. We would also have to set up a paywall for users because we don't want to be handing out free ai usage.

Built With

Share this project:

Updates