Agent UX

Inspiration

Not long ago, we were working with a sports nonprofit, a club with over 800 members, on designing their website from scratch. We had a real project, a real user base, and a real opportunity to do this right. Once we finished the first prototype of the website, we did what you're supposed to do: we reached out to their members for usability testing.

Nobody responded.

We had already spent days preparing. Interview questions carefully worded with feedback from the stakeholder. Surveys written, revised, and sent out. Hours of work before a single user had seen a single screen. And when the responses never came, we did what every design student eventually does, we grabbed the people closest to us. Other college students. Friends. Roommates. The feedback came back fine. Positive, even. But we knew the truth: we had just designed a website for 800 athletes and sports community members, and tested it on people who looked nothing like them.

That experience stuck with us. Not just the sampling problem, but the cost of the process itself. The time spent writing questions instead of improving the product. The energy poured into recruitment instead of design. Usability testing is supposed to make your product better, but the overhead of doing it can slow everything else down.

And that's for a volunteer student team with time to spare. For a small business owner launching their first website, a nonprofit coordinator managing everything alone, or a developer trying to ship fast, formal usability research isn't just difficult. It can be out of reach.

When we started exploring Browser Use, we saw a real opportunity. AI agents that navigate websites the way real people do, clicking, scrolling, getting confused in all the right places, could stand in for the audiences too hard to reach.

AgentUX is built on the belief that understanding your users shouldn't require a research budget, a recruitment pipeline, or weeks of preparation. Every team deserves to know if their product actually works from the student, the nonprofit or small business to the developer shipping something new. That knowledge should be available to everyone, not just the teams with the resources to go get it.

What it does

AgentUX is a Chrome extension that brings usability testing to any team, on any website, in under a minute. Open the extension and you're greeted with a simple starting point. Choose your user personas such a first-time visitor and a Gen X user or create your own to match your real audience. Set how many tasks you want the agents to run, review the generated task list, edit anything that doesn't fit, and hit evaluate.

AgentUX spawns Browser Use that navigate your website simultaneously as each persona from clicking through pages, filling out forms to hunting for information, exactly as a real user would. A live log streams every action in real time: each click, each navigation decision, each moment of hesitation, annotated with icons so you can follow exactly what the agent is experiencing. Alongside it, you watch the actual Browser Use session running. When the session completes, AgentUX surfaces every usability issue it found ranked by severity, specific to the element, and paired with a clear explanation of why it matters. Each finding comes with a concrete improvement suggestion attached.

From there you stay in control. Review each fix, apply the ones that make sense, and skip the ones that don't. For every accepted fix, AgentUX injects the change directly into the page as HTML and CSS including a before and after so you can see exactly what changed. Accepted changes are reflected locally so you can review the full picture before doing anything permanent.

When you're ready to implement, copy a prompt with all the accepted changes and paste it into Claude, ChatGPT, or whichever AI code editor your team uses to ship the final version.

How we built it

AgentUX is built as a Chrome Extension paired with a FastAPI backend. The extension provides a side panel UI where users can select personas, generate tasks, and view results, all without leaving the page they're testing.

On the backend, we use a 7-step pipeline orchestrated with async Python. Websites are summarized using Google Gemini, which also generates usability tasks tailored to the page content. The core innovation is using Browser Use cloud sessions to run real AI agents that autonomously navigate websites in parallel, each agent embodying a different user persona (elderly user, first-time visitor, or custom personas). These aren't simulations, the agents actually click buttons, scroll pages, fill forms, and express confusion in real time.

Agent trajectories are analyzed by a scoring engine that extracts confusion signals (hesitation, backtracking, retries, misclicks) and builds heatmaps showing where users struggled. A second LLM pass generates concrete fix suggestions with injectable CSS and JavaScript, which are visually validated using Playwright before being presented to the user. Fixes can be applied and previewed live on the page with a single click.

The frontend uses a tab-based flow (Setup → Progress → Results) with live agent feeds showing persona thinking and actions in real time, and live browser previews via iframes into the Browser Use cloud sessions.

Challenges we ran into

Browser Use cloud API integration was the biggest hurdle. The API's lastStepSummary field didn't include agent reasoning, only raw actions, and those actions referenced elements by internal numbers like "Clicked on 11" instead of human-readable descriptions. We had to build an entire parsing and translation layer (_parse_step_summary) to convert raw agent output into meaningful log entries, plus a template-based thinking commentary system to surface what each persona was "thinking" during their actions.

Noise filtering was an ongoing battle. The agents would execute internal commands like Python: html = await browser.get_html() or Running JavaScript that leaked into the live feed. We had to iteratively expand our noise filter as new patterns appeared during testing.

Polling and missed steps were tricky, with a 5-second polling interval, we were missing agent actions entirely. We dropped it to 2 seconds and switched from reading only lastStepSummary to consuming the full steps array from the API to catch everything.

Live preview coordination across the Chrome extension, content scripts, and background workers required careful state management, especially handling tab switches, fullscreen mode, and persisting fixes across page reloads.

Accomplishments that we're proud of

Real agents, not simulations. Our personas actually browse websites autonomously through real browsers in the cloud. The elderly persona zooms in 3x before reading anything. The first-time user explores the page with no prior knowledge. You can watch them navigate in real time.
One-click fix application. The AI doesn't just identify problems, it generates CSS and JavaScript fixes that can be applied and previewed instantly on the live page, with before/after screenshots. Fixes persist across page reloads.
The live agent feed. Watching personas think out loud ("Oh, I think this might be what I need. Let me click it carefully.") while performing actions makes the testing feel alive and gives genuine insight into usability issues.
Visual edit validation. We built a Playwright-based validation step that actually injects each suggested fix and takes before/after screenshots to verify it produces a visible change, dropping fixes that do nothing.
The full loop. From "paste a URL" to "here are your issues with one-click fixes and a deploy prompt for your IDE", the entire usability testing workflow runs in under 5 minutes with zero manual testing effort.

What we learned

LLM agents are surprisingly good at embodying personas, but they need extremely specific system prompts. Vague instructions like "act confused" produce generic output, while specific instructions like "zoom in 3 times before reading anything" and "describe buttons by their visible text, not element numbers" produce realistic, useful behavior.
The gap between raw AI output and user-facing UI is massive. Most of our engineering effort went into translating agent internals into something humans can actually understand and act on.
Browser automation at scale is hard. Parallel cloud sessions, polling, state management, and error handling for agents that can do literally anything on a webpage required robust architecture and lots of edge case handling.
Chrome extension development has unique constraints, side panels, content script isolation, permission models, and cross-context messaging all add complexity that's easy to underestimate.

What's next for Agent UX

More personas out of the box : visually impaired users (screen reader simulation), non-native English speakers, power users, and mobile-first users
Accessibility auditing with WCAG compliance scoring tied to real agent experiences, not just static analysis
Regression testing : automatically re-run the same tests after fixes are applied to verify improvements and catch regressions
Team collaboration : shared test results, fix approvals, and historical tracking across team members
CI/CD integration : run AgentUX as part of your deploy pipeline so usability regressions are caught before they ship
Heatmap visualization : overlay confusion signals directly on the page as a visual heatmap showing exactly where users struggled