BrowserBuddy
BrowserBuddy is an AI-powered browser extension that understands the webpage you're on and helps you act on it instantly.
Instead of copying content into other tools, users can simply ask the page itself. BrowserBuddy can summarize articles, generate notes or emails from content, and even find cheaper or visually similar products when browsing online stores.
Inspiration
While browsing the web, we constantly switch between tabs and tools to get things done. If we want to compare product prices, summarize an article, or draft an email based on something we’re reading, we usually have to copy content into an AI chatbot or search engine.
We realized that most AI assistants operate without awareness of the webpage you're actually on. This disconnect inspired us to build BrowserBuddy, an assistant that understands the content of the current webpage and can act on it directly.
Our goal was to create an AI layer for the web, where any page becomes interactive through natural language.
What it does
BrowserBuddy is a browser extension that captures the text and images on the current webpage and allows users to interact with them using an AI agent.
Users can ask questions or give commands such as:
- Summarize this article
- Turn this page into notes
- Write an email based on this content
- Find this product cheaper
- Find visually similar items
For example, if a user is browsing a clothing site and asks:
"Find this sweater cheaper"
BrowserBuddy identifies the item and searches other stores for better prices.
Instead of switching between tools and tabs, users can simply ask the page itself.
How we built it
BrowserBuddy consists of three main components:
1. Browser Extension
The extension captures webpage context including:
- page URL
- page title
- visible text
- images on the page
This information is sent to the backend when the user interacts with the assistant.
2. Backend Agent
The backend is built with FastAPI and manages agent workflows.
Each user request follows this pipeline:
Intent Detection
Determine what the user wants to do.Context Analysis
Analyze the webpage content.Tool Selection
Choose the correct tool for the task.Response Generation
Return results to the extension.
The system uses tool-calling workflows to dynamically perform tasks.
3. Product Search Pipeline
To find similar or cheaper products we combine:
- text-based product search
- image similarity search using embeddings
Images are converted into vector embeddings and compared using similarity search to find visually related products.
Tech Stack
Frontend (Extension)
- TypeScript
- Chrome Extension APIs
Backend
- FastAPI
- Python
AI
- LLM-based agent workflows
- Tool calling
- Image embeddings for similarity search
Infrastructure
- Redis (session context)
- Vector similarity search
Challenges we ran into
Identifying the correct product
Many pages contain multiple products or images. When users say something like:
"find this cheaper"
it can be difficult to determine which item they mean.
We addressed this by analyzing page context and allowing the agent to infer the most relevant product.
Extracting useful page context
Websites vary greatly in structure. We had to carefully decide how much page text and how many images to capture so the agent could reason effectively without unnecessary noise.
Reliable tool usage
We designed our system so the model first determines the user's intent before deciding which tool to call. This significantly improved reliability and reduced incorrect tool usage.
What we learned
Building BrowserBuddy taught us how to design agentic AI systems that interact with real-world environments.
We learned how to:
- structure browser data for AI reasoning
- design tool-based AI workflows
- combine vision and text-based search
- handle ambiguous user queries
Most importantly, we learned that context is the missing layer in many AI assistants.
Future Improvements
- Better product detection using DOM structure
- More advanced visual similarity search
- Support for multi-step AI workflows
- Integration with productivity tools
Our vision is simple:
The web shouldn’t just be something you browse — it should be something you can talk to.
Built With
- fastapi
- mcp
- python
- redis
- typescript
Log in or sign up for Devpost to join the conversation.