Inspiration

We noticed that even tech-savvy users struggle when navigating unfamiliar websites, whether it's filing a tax return, booking a complex itinerary, or configuring enterprise software. Existing help tools are static, site-specific, and go stale fast. We wanted to build something universal: a single tool that could guide anyone through any website in real-time, powered by AI that actually sees what's on the page.

What it does

PagePilot is a Chrome extension that generates interactive, step-by-step walkthroughs for any webpage using AI. You open the side panel, type what you want to accomplish ("How do I refund a payment?"), and PagePilot scans the live page, identifies interactive elements, and uses Claude AI to generate one precise instruction at a time. A spotlight overlay highlights exactly where to click or type, with a tooltip showing the instruction. Steps advance automatically as you interact, and an autonomous mode can even click through steps for you. Guides persist across page navigations, re-scanning the DOM at each step so instructions are always accurate.

How we built it

PagePilot is a Chrome Extension (Manifest V3) built entirely in vanilla JavaScript with no build tools or frameworks. The architecture has three main components: a side panel for the chat UI, a background service worker that orchestrates API calls and manages guide state, and a content script that extracts DOM elements and renders the spotlight overlay. The content script scans for up to 150 interactive elements (buttons, inputs, links, ARIA roles) using visibility checks and accessibility-first labeling (aria-label, innerText, title). The background worker sends this element list along with the user's goal to the Anthropic Claude API, which returns a single structured JSON step. The spotlight uses a box-shadow technique to dim the entire page except the target element, with a pulsing ring animation and a floating tooltip. A generation counter system prevents stale async callbacks from rendering after a guide is reset. Guide state is persisted in chrome.storage.session so walkthroughs survive page navigations. The landing page is deployed on Netlify, and the presentation deck is generated programmatically using Python (python-pptx and ReportLab) with custom Pillow-generated icons.

Challenges we ran into

Getting the spotlight overlay to work reliably across different websites was tricky. Z-index stacking contexts vary wildly between sites, so we had to use maximum z-index values (2147483645) and careful layering to ensure the target element stays clickable above the dimmed overlay. Tooltip positioning also required smart clamping logic to avoid rendering off-screen. Another challenge was preventing stale steps from appearing: since DOM extraction and API calls are async, a user could reset or start a new guide while a previous step was still loading. We solved this with a generation counter pattern. Making autonomous mode work correctly also required careful handling, since type actions can't be fully automated (the user still needs to enter text manually), and different action types need different execution strategies and timing.

Accomplishments that we're proud of

We're proud of the one-step-at-a-time architecture. Instead of generating all steps upfront (which would hallucinate about pages the AI hasn't seen), PagePilot re-extracts the live DOM before every single step. This means instructions are always accurate, even after the page changes from user interaction or navigation. We're also proud of how polished the UX feels: the spotlight overlay with its pulse animation, the smart tooltip positioning, the autonomous mode with configurable delay, and the seamless cross-page guide persistence all come together into something that genuinely feels like a helpful companion rather than a prototype.

What we learned

We learned a lot about Chrome Extension Manifest V3 architecture, particularly the nuances of communication between service workers, content scripts, and side panels. We also gained experience with prompt engineering for structured JSON output, keeping Claude's responses deterministic with low temperature settings (0.1) and strict system prompts. On the frontend side, we learned creative CSS techniques like the box-shadow spotlight hole and how to handle z-index wars across arbitrary websites. We also learned the importance of defensive async patterns (generation counters) when dealing with user-initiated state changes during pending API calls.

What's next for PagePilot

The next major feature is bring-your-own-API-key support, allowing users to enter their own API key and choose from multiple model providers (OpenAI, Anthropic, Google, and more). This means PagePilot will work with any compatible API, giving users flexibility in cost, speed, and model preference. Beyond that, we want to add guide sharing (export a walkthrough as a shareable link), multi-language support for instructions, voice narration for accessibility, and analytics to help website owners understand where users get stuck. We're also exploring the idea of community-contributed guide templates for popular workflows.

Built With

Share this project:

Updates