Aegis - AI-Powered QA Testing Agent

The generated test code uses robust ID-based selectors (#twotabsearchtextbox, #nav-search-submit-button) instead of brittle text selectors.
AEGIS generates and runs a 4-step Playwright test for Amazon search - all steps pass on first try.

Inspiration

Writing end-to-end tests is tedious and time-consuming. QA engineers spend hours manually writing Playwright selectors, dealing with strict mode violations, and maintaining brittle tests. We asked: what if you could just describe what to test in plain English and let AI handle the rest?

What it does

AEGIS is an AI-powered QA agent and browser automation platform. You chat with it in natural language, it launches a real browser, navigates websites, interacts with elements, and records every action as a production-ready Playwright test script. When tests fail, AEGIS reads the errors, fixes the code, and re-runs automatically (self-healing).

It works in two modes:

QA Testing - "test the login flow on my app" → generates and runs a full Playwright test
Browser Automation - "go to google and search for AI news" → direct browser control via chat

How we built it

Gemini 3 Pro powers the AI reasoning via Google AI SDK + Vercel AI SDK v6
MCP (Model Context Protocol) connects the frontend to a local browser agent over Streamable HTTP
Custom Snap provider - instead of screenshots, we built a lightweight accessibility-tree snapshot system that uses @ref references and verified-unique CSS selectors, consuming ~93% fewer tokens than image-based approaches
Playwright handles both browser automation and test execution with headed mode so you can watch tests run
Self-healing loop - if a generated test fails, the AI reads Playwright's error output, diagnoses the issue, patches the code, and retries

Challenges we ran into

Selector reliability on complex pages - sites like Amazon have dozens of elements with identical text. We solved this by computing verified-unique selectors inside the browser DOM and adding match-count warnings to guide the AI away from ambiguous selectors.
iframe support - many modern apps use iframes extensively. We had to implement frameLocator chaining across the snapshot, codegen, and interaction layers.
Token efficiency - full page screenshots burn through context windows fast. Our Snap provider reduces token usage by ~93% while providing richer, more actionable data than screenshots.

What we learned

Context engineering matters more than prompt engineering. Giving the AI the right data (verified selectors, match counts, structured snapshots) is far more effective than telling it what to do in the system prompt.

Built With

ai-sdk
express.js
gemini
mcp
mongodb
next.js
node.js
playwright
react
shadcn/ui
tailwindcss
typescript
zod

Updates

Sadman Chowdhury started this project — Feb 03, 2026 05:49 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.