Inspiration

My old professor could teach a room of 50 people but could not find the submit button on the course portal. Every website looks different, every dashboard moves things around, and there is no consistent guide to help you navigate. Watching someone smart and capable feel helpless in front of a browser convinced me this was a real problem worth solving.

What it does

Staple puts an AI cat directly on any webpage. You ask it where anything is in plain English, it walks across the screen to the exact element, highlights it, and guides you through every step automatically. When a task spans multiple pages it tracks your progress, detects navigation, re-evaluates the remaining steps against the new UI, and picks up exactly where it left off.

How we built it

Chrome and Firefox browser extension with a shared codebase using Manifest V3 and V2 respectively. DOM scraping maps every interactive element and its coordinates on the page without screenshots or vision models. DeepSeek handles natural language to element matching at fractions of a cent per query. OpenUI renders every AI response as a dynamic generative UI component in the popup chat. Langfuse gives full observability on every agent decision in real time.

Challenges we ran into

Getting the step walker to survive page navigation was the hardest problem. SPAs change the URL without reloading, elements shift position after scroll, and message passing between the content script and popup is unreliable when the popup is closed. We solved this by re-evaluating remaining steps against the fresh DOM on every page change using a new DeepSeek call rather than trying to match old coordinates to new elements.

Accomplishments that we're proud of

Building a fully autonomous navigation agent that works on any website without screenshots, OS level access, or any setup beyond installing the extension. The DOM based approach makes it faster and cheaper than anything that relies on vision models. And the cat is very cute.

What we learned

Browser extensions are deceptively complex. The gap between content scripts, popup scripts, service workers, and storage APIs creates subtle timing bugs that only appear in production. Real autonomy means handling failure gracefully, not just the happy path.

What's next for Staple

  • Voice input so you can just say what you need
  • A mode that proactively highlights the most important next action on any page without being asked
  • An enterprise version for onboarding new employees to internal tools where every dashboard looks different and documentation is always out of date

Built With

Share this project:

Updates