Inspiration

Most help on the web is generic and detached from the page the user is actually on. WayFinder was built to remove guesswork and give precise, step-by-step guidance grounded in the real UI users see, not abstract tutorials.

What it does

  • Scans the current webpage for visible, interactive elements such as buttons, inputs, and links.
  • Generates a compact, structured representation of the page and sends it to Gemini.
  • Receives grounded, ID-referenced instructions and converts them into clear, sequential steps.
  • Allows users to execute steps directly using a “Do It” action that can highlight, scroll to, or click the target element.
  • Supports both text and voice input for natural interaction.

How we built it

  • Chrome Extension built with Manifest V3 and vanilla JavaScript.
  • Content scripts scan and filter the DOM to extract only visible, actionable elements.
  • Each element is tagged with a temporary identifier to enable grounded model responses.
  • A lightweight action engine maps Gemini’s output back to the page to perform safe UI actions.
  • Chrome storage is used for state and configuration, with browser speech recognition for voice input.

Challenges we ran into

  • Large and dynamic DOMs quickly exceeded LLM context limits, requiring aggressive filtering and simplification.
  • Early model outputs hallucinated UI elements, solved by enforcing strict ID-based grounding.
  • Single-page apps frequently changed state after scanning, which required reliable re-scan and refresh handling.

Accomplishments that we're proud of

  • Built a DOM summarization approach that preserves actionability while staying within model limits.
  • Shifted from passive instructions to an active assistant through the “Do It” execution feature.
  • Designed robust prompting and response handling that produces executable, deterministic outputs.

What we learned

  • UI-level agents demand strict structure and grounding; free-form text is not reliable for action.
  • Treating the webpage as the model’s primary context fundamentally changes assistant design.
  • Small interaction details like element highlighting dramatically improve user trust and clarity.

What's next for WayFinder

  • Integrate visual understanding to handle canvas-based and non-standard UI components.
  • Enable cross-page workflows that persist context across navigation.
  • Expand toward enterprise onboarding and internal tool guidance.
  • Add stronger safety checks to prevent unintended or destructive actions.

Built With

Share this project:

Updates