VisualForensics

Inspiration

Most usability problems reach production not because teams lack care, but because traditional UX testing remains slow, expensive, and operationally demanding. Recruiting participants, building test environments, and analyzing results requires resources that early-stage teams rarely possess.

Every product decision represents a bet on human experience. Will a user with low vision locate the critical button? Will a power user in a hurry notice what matters? These questions remain unanswered until costly problems emerge in production.

We asked a straightforward question: why can't design tools think like diverse users? VisualForensics exists to answer that---enabling teams to validate design decisions against simulated human experience before writing a single line of code or recruiting a single participant.

What It Does

VisualForensics analyzes usability directly from a UI image.

A user uploads a screenshot, mockup, or Figma frame. They may optionally specify what they want to test (for example, accessibility or clarity). If no test is provided, the system performs an internal A/B-style evaluation by comparing alternative interpretations of the same interface.

The system simulates 50 distinct user personas, each with different constraints and goals, and evaluates how they would perceive and interact with the UI.

The results include:

A tagged version of the image highlighting key focus points
Findings anchored to exact pixel locations
Persona-specific interaction paths
Identified usability and accessibility issues
Concrete recommendations for improvement

All analysis is performed from a single image, with no code, no live site, and no configuration.

How We Built It

VisualForensic was built in Google AI Studio using the Gemini 3 API.

The input to the system is an image of a UI. Gemini 3’s multimodal capabilities are used to interpret layout, visual hierarchy, text, icons, spacing, and contrast directly from pixels.

We implemented an agent-based evaluation loop inspired by:

PersonaLLM: https://arxiv.org/abs/2305.02547
UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design:
https://dl.acm.org/doi/10.1145/3706599.3719729

Our implementation differs from these foundations in a critical respect: while UXAgent requires live website access and DOM parsing, VisualForensic operates on static images through Gemini 3's advanced vision capabilities. This eliminates setup friction while maintaining analytical depth.

The 50-agent swarm each receives distinct persona configurations. Each agent represents a different persona and independently analyzes the interface. Their observations are aggregated into structured findings and mapped back onto the image with precise spatial references.

If no explicit test is provided, the system performs an internal comparison across agents to surface implicit A/B-style differences in perception and interaction.

Challenges We Ran Into

High-precision visual tagging: Identifying and labeling every UI element at pixel-level accuracy from static images, across diverse layouts and design systems, without access to code or runtime data.
Actionable aggregation: Transforming raw pixel-level observations into structured, persona-specific insights that are immediately understandable and usable by designers and engineers.
Simultaneous multi-persona reasoning: Maintaining coherent attention, interaction paths, and confidence estimates for 50 concurrent simulated users on a single interface, producing reproducible, engineering-ready findings.

Accomplishments That We're Proud Of

A working usability analysis tool that operates from a single static image
Agent-based evaluation without requiring explicit variants or live deployments
Pixel-anchored findings instead of abstract UX advice
No installation, no setup, and no user recruitment
A public demo that judges can interact with directly via AI Studio
Enabled teams to identify critical usability issues early, saving significant time by reducing late-stage redesigns and preventing broken UX from reaching production

What We Learned

We learned that Gemini 3’s multimodal capabilities significantly lower the barrier to meaningful product analysis. Being able to reason directly from images allowed us to move from idea to usable insight without waiting for implementation, instrumentation, or live data. From a product perspective, this shifts usability evaluation much earlier in the lifecycle, when changes are cheapest and most impactful.
When properly structured, multimodal models can function as credible approximations of human behavior by jointly reasoning about visual layout, semantic content, and psychological factors—enabling analyses that previously required multiple specialized systems.

Gemini 3 Integration

Vision-to-behavior pipeline: Gemini 3 interprets raw UI images and transforms them into actionable interface representations, detecting elements, their functional role, and perceptual salience—all without any DOM or runtime data. The image literally becomes the “world” for hundreds of simulated agents.
Concurrent persona simulation: Fifty independent personas, each with distinct abilities and goals, interact with the same interface in parallel. Gemini maintains long-term coherence across these simulations, producing multi-turn behavioral traces that reveal how different users see, click, or abandon elements.
Pixel-anchored actionable insights: Findings are not abstract—they are spatially grounded and structured with confidence metrics, making it possible to map every observation back to the exact pixel, element, or interaction path. This converts subjective usability intuition into reproducible engineering signals.

What's next for VisualForensic

Our goal is to make usability insights immediate, actionable, and integral to product decision-making. Next steps include:

Embedding targeted tests for key user flows, such as onboarding, error recovery, and navigation, to surface the highest-impact friction points.
Enabling rapid comparison of multiple design iterations to guide decisions with evidence rather than intuition.
Prioritizing fixes based on quantified impact across different personas, so engineering effort targets what matters most.
Integrating seamlessly into design workflows, allowing teams to validate choices before implementation and reduce costly post-launch UX issues.

VisualForensics aims to shift UX evaluation upstream: giving teams confidence that their designs work for real people before a single line of code is shipped.

Built With

gemini3
googleaistudio

Updates

MUHAMMAD AHMAD started this project — Feb 05, 2026 12:19 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.