WayFinder

Inspiration

Most help on the web is generic and detached from the page the user is actually on. WayFinder was built to remove guesswork and give precise, step-by-step guidance grounded in the real UI users see, not abstract tutorials.

What it does

Scans the current webpage for visible, interactive elements such as buttons, inputs, and links.
Generates a compact, structured representation of the page and sends it to Gemini.
Receives grounded, ID-referenced instructions and converts them into clear, sequential steps.
Allows users to execute steps directly using a “Do It” action that can highlight, scroll to, or click the target element.
Supports both text and voice input for natural interaction.

How we built it

Chrome Extension built with Manifest V3 and vanilla JavaScript.
Content scripts scan and filter the DOM to extract only visible, actionable elements.
Each element is tagged with a temporary identifier to enable grounded model responses.
A lightweight action engine maps Gemini’s output back to the page to perform safe UI actions.
Chrome storage is used for state and configuration, with browser speech recognition for voice input.

Challenges we ran into

Large and dynamic DOMs quickly exceeded LLM context limits, requiring aggressive filtering and simplification.
Early model outputs hallucinated UI elements, solved by enforcing strict ID-based grounding.
Single-page apps frequently changed state after scanning, which required reliable re-scan and refresh handling.

Accomplishments that we're proud of

Built a DOM summarization approach that preserves actionability while staying within model limits.
Shifted from passive instructions to an active assistant through the “Do It” execution feature.
Designed robust prompting and response handling that produces executable, deterministic outputs.

What we learned

UI-level agents demand strict structure and grounding; free-form text is not reliable for action.
Treating the webpage as the model’s primary context fundamentally changes assistant design.
Small interaction details like element highlighting dramatically improve user trust and clarity.

What's next for WayFinder

Integrate visual understanding to handle canvas-based and non-standard UI components.
Enable cross-page workflows that persist context across navigation.
Expand toward enterprise onboarding and internal tool guidance.
Add stronger safety checks to prevent unintended or destructive actions.

Built With

chrome-extension-api
css
google-gemini-api
html
javascript
json
speedapi
web

Updates

Aritra Saha started this project — Feb 09, 2026 02:15 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.