Inspiration
While there have been tons of integrations of AI tools into different specific platforms, browsers remain stuck in the past. We wanted to bring the same experience that modern code editors provide into the browser and enable smart autocomplete and an inline assistant for any text field.
What it does
It monitors your DOM, embeds relevant parts of it into a vector database, and retrieves them instantly when you use autocomplete. It also keeps track of your most recent actions so it can detect repetitive tasks and avoid recommending the same things twice. The inline assistant also supports images.
How we built it
A browser extension runs locally in the user’s browser and monitors the content currently being viewed. Through this extension we can retrieve the cursor position and the text style of the text field the user is interacting with, along with the other information required to position a ghost text suggestion.
The extension intelligently tracks the viewport and only sends portions of the website that have not already been processed. These segments are received by the backend where they are chunked and stored in Qdrant using embeddings generated with Amazon Titan through Amazon Bedrock.
By leveraging the different embedding capabilities available in Titan, we can perform retrieval queries using the text the user has already written in the field combined with contextual information about the webpage and the user’s most recent actions. The most relevant chunks are then passed to Amazon Nova Lite, which generates a recommendation that is sent back to the browser.
The inline assistant works in a similar way except that it also retains context from the user’s prompt as well as any images that are provided.
Challenges we ran into
It was quite challenging to build a ghost text mechanism that works across the largest possible range of text fields. We spent quite some time trying to build an integration for Google Docs for example. Working with canvas based text systems was difficult, and strict script injection protections ultimately prevented the integration from working.
Accomplishments that we are proud of
We were really excited when we first connected all the components and the system actually started generating recommendations based on context. Seeing the full pipeline work end to end was a big milestone for us.
The speed of the retrieval and generation processes especially surprised us. Even though we still had a lot of work ahead of us, particularly in improving the quality of the recommendations, deciding which context to pass, and refining the prompts, it was incredibly motivating to see how fast and responsive the system already was.
What we learned
Most RAG systems we built in past projects used locally running small embedding models that only found closely related text without truly understanding how to guide retrieval. This was the first time we worked with such a capable multimodal embedding system, so it was a new experience.
What is next for Mouse
Right now it is a well working prototype, but there are still security and privacy considerations to address. Currently we handle privacy by allowing the user to toggle the extension on or off with a shortcut so the extension can stop observing the page instantly. We would also need to introduce a paywall because we do not want to provide unlimited AI usage for free.
Built With
- amazon-web-services
- chrome
- fastapi
- nova-ai
- qdrant
Log in or sign up for Devpost to join the conversation.