Inspiration

Writing end-to-end tests is tedious and time-consuming. QA engineers spend hours manually writing Playwright selectors, dealing with strict mode violations, and maintaining brittle tests. We asked: what if you could just describe what to test in plain English and let AI handle the rest?

What it does

AEGIS is an AI-powered QA agent and browser automation platform. You chat with it in natural language, it launches a real browser, navigates websites, interacts with elements, and records every action as a production-ready Playwright test script. When tests fail, AEGIS reads the errors, fixes the code, and re-runs automatically (self-healing).

It works in two modes:

  • QA Testing - "test the login flow on my app" → generates and runs a full Playwright test
  • Browser Automation - "go to google and search for AI news" → direct browser control via chat

How we built it

  • Gemini 3 Pro powers the AI reasoning via Google AI SDK + Vercel AI SDK v6
  • MCP (Model Context Protocol) connects the frontend to a local browser agent over Streamable HTTP
  • Custom Snap provider - instead of screenshots, we built a lightweight accessibility-tree snapshot system that uses @ref references and verified-unique CSS selectors, consuming ~93% fewer tokens than image-based approaches
  • Playwright handles both browser automation and test execution with headed mode so you can watch tests run
  • Self-healing loop - if a generated test fails, the AI reads Playwright's error output, diagnoses the issue, patches the code, and retries

Challenges we ran into

  • Selector reliability on complex pages - sites like Amazon have dozens of elements with identical text. We solved this by computing verified-unique selectors inside the browser DOM and adding match-count warnings to guide the AI away from ambiguous selectors.
  • iframe support - many modern apps use iframes extensively. We had to implement frameLocator chaining across the snapshot, codegen, and interaction layers.
  • Token efficiency - full page screenshots burn through context windows fast. Our Snap provider reduces token usage by ~93% while providing richer, more actionable data than screenshots.

What we learned

Context engineering matters more than prompt engineering. Giving the AI the right data (verified selectors, match counts, structured snapshots) is far more effective than telling it what to do in the system prompt.

Built With

Share this project:

Updates