Inspiration
The modern web is an incredible repository of information, but navigating it has become increasingly complex. We’ve all been there: staring at a cluttered university portal, a confusing banking dashboard, or a dense documentation site, simply trying to find the "Log Out" button or a specific settings page.
We realized that while search engines help us find websites, there was no "GPS" to help us navigate inside those websites. We asked ourselves: Why click through five layers of menus when you could just ask for what you need?
We wanted to build a "Smart Sidecar" for the browser—a tool that makes the web accessible, efficient, and voice-controlled, essentially bringing the power of semantic search to every webpage you visit.
How We Built It
We built the AI Website Navigation Assistant using a Hybrid AI Architecture to balance intelligence, speed, and privacy.
1. The Frontend (Chrome Extension)
The interface is a Chrome Extension built with HTML/CSS/JS (Manifest V3). It injects a chat interface directly into the DOM, allowing users to interact without leaving the page. It captures the webpage's interactable elements (links, buttons, inputs) and sends them to our backend for analysis.
2. The Dual-AI Backend
This is the core innovation of our project. We implemented a toggle system:
- ☁️ Cloud Mode (Node.js + Google Gemini): For complex reasoning (e.g., "Where can I find information about the 2024 curriculum changes?"), we use the Gemini API. We pass a minified structure of the page to Gemini, which returns the most logical navigation path.
- 🔒 Local Mode (Python + SentenceTransformers):
For speed and privacy (e.g., "Login," "Settings"), we use a local Python server running
all-MiniLM-L6-v2. We map user queries to page elements using Semantic Vector Embeddings.
The Math Behind Local Navigation
To make the local model work without internet, we convert both the user's query vector () and the website's link text vectors () into high-dimensional embeddings. We then calculate the Cosine Similarity to find the best match:
$$\text{Similarity}(\mathbf{Q}, \mathbf{L}i) = \frac{\mathbf{Q} \cdot \mathbf{L}_i}{|\mathbf{Q}| |\mathbf{L}_i|} = \frac{\sum{j=1}^{n} Q_j L_{ij}}{\sqrt{\sum_{j=1}^{n} Q_j^2} \sqrt{\sum_{j=1}^{n} L_{ij}^2}}$$
If the similarity score exceeds our threshold (), the system automatically highlights and navigates to the target element.
Challenges We Faced
- Latency vs. Accuracy: Running a full LLM for every click is too slow.
Solution: We implemented the Hybrid System. Simple navigational queries utilize the local vector store (taking ), while complex queries are routed to Gemini.
Privacy & Tracking: Injecting scripts into sensitive pages (like banking) is risky.
Solution: We built a Privacy Shield that acts as a firewall. It identifies and blocks requests to known tracking domains (e.g.,
doubleclick,google-analytics) before the AI processes the page, ensuring user data isn't leaked to third-party ad networks.DOM Complexity: Websites are built differently. Some use
<a>tags, others use<div>withonClicklisteners.Solution: We developed a heuristic scraper that looks for
role="button",aria-label, and CSS cursor properties to identify "clickable" elements, not just standard links.
What We Learned
- The Power of Edge AI: We learned that you don't always need a massive LLM. For specific tasks like semantic matching, a small, locally hosted transformer is faster and more private.
- Browser Security: Working with Chrome's
Manifest V3taught us a lot about content security policies, CORS, and how to safely inject scripts without breaking website functionality. - Accessibility First: By adding voice commands, we realized this tool isn't just a productivity hack; it's a potent accessibility tool for users with motor impairments who struggle with precise mouse movements.
What's Next?
We plan to implement Automated Form Filling using Gemini to not just navigate to a page, but also help users perform actions like "Book a ticket to London" by autonomously interacting with the inputs.





Log in or sign up for Devpost to join the conversation.