UI Sentinel

Inspiration

We were inspired by a common frustration in software development: brittle and time-consuming UI tests. Traditional test scripts break with the smallest UI change, requiring constant maintenance. We saw an opportunity with the rise of powerful Large Language Models (LLMs) to create a new paradigm. Instead of writing code, we wanted to enable testing directly from the customer journey documents that teams already have. Our guiding vision was to prepare for the "agent age," building a system that tests applications the way an AI agent would actually visit and use them in the near future.

What it does

UI Sentinel is a no-code, autonomous system testing framework. It takes two inputs: a document describing a customer journey and a set of input data (like names, dates, or emails). It then uses a sophisticated multi-agent AI pipeline to:

Understand the Journey: Automatically extract and refine a high-level journey into a series of concrete, atomic tasks.
Execute End-to-End Tests: Autonomously navigate a live web browser to perform the tasks, from filling forms to clicking buttons.
Handle Complex UI: Intelligently interact with tricky components like custom dropdowns and modals using specialized "expert" agents.
Self-Heal: If it encounters an error or a UI glitch, its Dynamic Retry mechanism allows it to re-evaluate the situation and attempt to complete the task in a different way.
Generate Reports: Finally, it produces a clear, understandable report detailing the execution and the AI's decisions, complete with a video of the test run.

How we built it

UI Sentinel is built on a modular, agent-based architecture using Python. The "brain" of our system is Google's Gemini model, and we use the browser-use library to give our agents control over a browser.

Our architecture is a robust pipeline consisting of several specialized agents that work in sequence:

User Journey Extractor and Refiner: The first agent in the chain, responsible for reading the input document and creating a structured test plan.
User Journey Evaluator: A crucial "sanity check" agent that validates the plan before execution.
Test Executor: The main worker that controls the browser, supported by specialized Dropdown and Modals experts.
Test Evaluator w/ Dynamic retry: A supervisor agent that monitors the Executor. If a task fails, this agent initiates the recovery and retry logic.

Challenges we ran into

During development, we identified and solved several key challenges inherent to automated testing:

Test Flakiness: Early tests were unreliable and would fail on temporary UI glitches. We solved this by implementing the Dynamic Retry system, where a failed task triggers a new LLM call to find a creative solution based on the current context.
Complex UI Components: General automation struggles with non-standard web components. This led to the creation of Specialized Expert Agents which are pre-trained with system prompts to handle tricky elements like custom dropdowns.
Scalability and Reusability: To avoid writing new files for every test, we designed a Config-Driven Architecture. This allows us to separate the test journey template from the specific test data, enabling us to run hundreds of test variations from a single, reusable flow.

Accomplishments that we're proud of

We're most proud of achieving three core goals that define UI Sentinel:

Truly No-Code Testing: We successfully created a system where robust tests are derived from customer journey documents, not from complex code.
Tolerance to UI Changes: Our biggest accomplishment. If a developer changes a button's color or moves it, the test doesn't break as long as the customer journey remains the same, as the AI adapts to the new layout.
Fully Understandable Reports: The output isn't just a pass/fail log but a clear report with explanations and a video that anyone on the team can understand and act upon.

What we learned

Building UI Sentinel taught us that the future of complex automation lies in multi-agent systems. Delegating tasks to specialized agents is far more effective and scalable than relying on a single, monolithic AI. We also learned that for automation, resilience is as important as execution. The dynamic retry mechanism was a game-changer, proving that a system's ability to recover from failure is critical for building trust.

What's next for UI Sentinel

The potential for UI Sentinel is huge. Our next steps are focused on expanding its capabilities:

Improve Performance: We plan to explore more advanced multi-agent collaboration techniques and parallelism to significantly speed up test execution.
Hybrid Code Generation: We will investigate using UI Sentinel to generate optimized test scripts for traditional frameworks like Selenium or Playwright, combining our agent's intelligence with the raw speed of compiled code.
Visual Regression Testing: We will integrate vision models to allow the agent to not just verify functionality but also detect visual bugs, like broken layouts, incorrect colors, or misaligned elements.

Built With

gemini
mcp
playwright
python
pyyaml

Updates

Pranav Shirbhate started this project — Aug 30, 2025 08:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.