Inspiration
Most help on the web is generic and detached from the page the user is actually on. WayFinder was built to remove guesswork and give precise, step-by-step guidance grounded in the real UI users see, not abstract tutorials.
What it does
- Scans the current webpage for visible, interactive elements such as buttons, inputs, and links.
- Generates a compact, structured representation of the page and sends it to Gemini.
- Receives grounded, ID-referenced instructions and converts them into clear, sequential steps.
- Allows users to execute steps directly using a “Do It” action that can highlight, scroll to, or click the target element.
- Supports both text and voice input for natural interaction.
How we built it
- Chrome Extension built with Manifest V3 and vanilla JavaScript.
- Content scripts scan and filter the DOM to extract only visible, actionable elements.
- Each element is tagged with a temporary identifier to enable grounded model responses.
- A lightweight action engine maps Gemini’s output back to the page to perform safe UI actions.
- Chrome storage is used for state and configuration, with browser speech recognition for voice input.
Challenges we ran into
- Large and dynamic DOMs quickly exceeded LLM context limits, requiring aggressive filtering and simplification.
- Early model outputs hallucinated UI elements, solved by enforcing strict ID-based grounding.
- Single-page apps frequently changed state after scanning, which required reliable re-scan and refresh handling.
Accomplishments that we're proud of
- Built a DOM summarization approach that preserves actionability while staying within model limits.
- Shifted from passive instructions to an active assistant through the “Do It” execution feature.
- Designed robust prompting and response handling that produces executable, deterministic outputs.
What we learned
- UI-level agents demand strict structure and grounding; free-form text is not reliable for action.
- Treating the webpage as the model’s primary context fundamentally changes assistant design.
- Small interaction details like element highlighting dramatically improve user trust and clarity.
What's next for WayFinder
- Integrate visual understanding to handle canvas-based and non-standard UI components.
- Enable cross-page workflows that persist context across navigation.
- Expand toward enterprise onboarding and internal tool guidance.
- Add stronger safety checks to prevent unintended or destructive actions.
Built With
- chrome-extension-api
- css
- google-gemini-api
- html
- javascript
- json
- speedapi
- web
Log in or sign up for Devpost to join the conversation.