Inspiration
Writing a single end-to-end test in Selenium or Playwright isn't hard. Maintaining 200 of them across a growing application is a completely different story, and that's the part nobody warns you about.
Before you even write your first test, the learning curve is steep. You need to pick a framework (Selenium? Playwright? Cypress?), learn its API, understand its runner, figure out its assertion library, set up the browser drivers, configure the test environment, and decide on a selector strategy. That's days of onboarding before a single test passes. For teams without dedicated QA automation engineers, which is most teams, this upfront investment alone is enough to shelve the idea entirely.
But the real cost shows up later. Every UI change becomes a maintenance event. A designer moves a button from the sidebar to the header and selectors break. A developer swaps a <div> for a <section> and locators fail. Someone renames a CSS class during a refactor and twenty tests go red overnight, and none of them are catching real bugs. They're just reacting to cosmetic changes. QA engineers end up spending close to 40% of their time doing this kind of upkeep: rewriting selectors, adjusting explicit waits, adding retry logic for flaky steps, debugging tests that pass locally but fail in CI. The scripts demand constant attention, and the effort compounds as the test suite grows.
Then there's the skill barrier. Writing robust automation scripts requires a specific technical profile. Someone who can code, who understands the DOM, who knows XPath or CSS selector syntax, who can debug asynchronous browser behavior. That rules out a huge part of the team. Product managers define the acceptance criteria. Designers know what the UI should look like. Support engineers know which workflows break most often. These are the people closest to what the application should do, and they're completely locked out of the automation process because every tool in the market assumes you can write code.
We kept coming back to one question: why does testing require programming in the first place? The thinking behind a test, what to test, what to expect, what the user flow looks like, is something anyone on the team can do. It's only the execution that demands code. And when Amazon Nova Act showed up with the ability to take a plain English instruction and carry it out in a real browser, understanding the page visually, finding elements semantically, acting the way a human would, we realized the execution barrier could disappear entirely.
That's the idea behind NovaQA. Stop asking people to learn a framework. Stop tying tests to brittle selectors. Let anyone on the team describe what they want to test in their own words, and let AI handle the rest.
What it does
NovaQA is a full-stack QA testing platform that lets you write end-to-end UI tests in plain English and have AI execute them in a real browser. No Selenium. No Playwright. No code.
Instead of writing this:
await page.goto('https://myapp.com/login');
await page.fill('#email', 'admin@example.com');
await page.click('button[type="submit"]');
await expect(page.locator('.dashboard-welcome')).toBeVisible();
You write this:
Step 1: Navigate to the login page Step 2: Enter "admin@example.com" in the email field Step 3: Click the "Sign In" button Assert: The dashboard page is now visible with a welcome message
That's it. NovaQA takes those plain English instructions, hands them to Amazon Nova Act, and Nova Act drives a real Chromium browser — clicking buttons, filling forms, navigating pages — exactly the way a human tester would. Screenshots are captured at every step, results stream to your dashboard in real-time via WebSocket, and you get a complete audit trail without writing a single line of automation code.
Key Features
AI-Powered Test Generation — Don't even know where to start? Just point NovaQA at a URL. Amazon Nova 2 Lite analyzes the page screenshot, identifies interactive elements, understands the page's purpose, and generates complete test cases with steps and assertions automatically. You go from zero tests to comprehensive coverage in minutes.
Parallel Fleet Execution — Time matters. NovaQA uses a ThreadPoolExecutor with isolated Nova Act browser sessions to run multiple tests simultaneously. A test suite that would take 30 minutes sequentially finishes in under 8 minutes with parallel execution.
Natural Language Assertions — Assertions are evaluated using Nova Act's act_get() with Pydantic boolean schemas. Instead of checking if element.isVisible(), you write "The shopping cart badge shows 3 items" and Nova Act evaluates that against the live page state. These assertions survive UI redesigns because they don't depend on any specific DOM structure.
Live Execution Streaming — Watch your tests execute step-by-step in real-time. WebSocket pushes every status update, screenshot, and result to the dashboard as it happens. No more "run and check later" — testing becomes an interactive, observable process.
Visual Regression Detection — Nova 2 Lite compares baseline screenshots against current screenshots to catch visual regressions. Unlike pixel-diff tools that generate noise over every minor rendering difference, Nova understands the page semantically and flags changes that actually matter to users.
Full-Stack Debugging — Every test run captures complete network logs with request/response headers and bodies. When a test fails, you can immediately tell whether it's a UI bug or a broken API endpoint — without reproducing the failure manually.
CI/CD Integration — A single API call (POST /api/webhook/run) plugs NovaQA into any pipeline — GitHub Actions, GitLab CI, Jenkins. Trigger test runs on every merge, every deployment, every pull request.
Screenshot Audit Trail — Every step produces a timestamped screenshot, creating a verifiable record of each test execution. This is gold for regulated industries like healthcare and finance that need documented testing evidence.
How we built it
We started with a core question: What's the simplest path from a human-written sentence to a browser action? That led us to Amazon Nova Act as the execution engine, and everything else was built to support that loop.
Backend — Python + FastAPI:
The backend is written in Python 3.12 with FastAPI. It handles all test orchestration, CRUD operations for projects/tests/suites, and the real-time WebSocket server. The test runner engine wraps Nova Act SDK calls — each plain English step becomes a nova.act() call, and assertions use nova.act_get() with Pydantic schemas to extract structured boolean results from the page state. We built an intelligent error classifier that distinguishes between transient network failures (which get automatically retried with linear backoff), model generation errors, and genuine test failures so the system doesn't mark a test as failed just because of a temporary network hiccup.
Fleet Manager:
For parallel execution, we built a fleet manager using Python's ThreadPoolExecutor. Each test gets its own isolated Nova Act browser session, so tests don't interfere with each other. The fleet manager handles session lifecycle, coordinates result collection, and streams progress updates for all concurrent tests through a single WebSocket connection.
AI Test Generator — Nova 2 Lite via Bedrock: The test generation feature uses Amazon Nova 2 Lite through Amazon Bedrock. When you hit "AI Generate," the system captures a screenshot of the target URL, sends it to Nova 2 Lite with a prompt asking it to identify interactive elements and suggest test scenarios, and parses the response into structured test cases. We went through several iterations on the prompt engineering to get test cases that are specific enough to be useful but general enough to survive minor UI changes.
Frontend — Next.js 15 + React 19: The dashboard is built with Next.js 15 and React 19, styled with Tailwind CSS. It features a dark-themed design with a natural language test editor, a real-time run viewer that shows live screenshots as tests execute, a network log inspector for debugging, and a visual regression comparison view with side-by-side screenshots. We built a "fix-in-place" workflow where you can edit a failed test step directly from the results page and update the test case without switching contexts.
Deployment:
The whole stack is containerized with Docker and deployed on AWS EC2 using Docker Compose. A Caddy reverse proxy handles HTTP routing between the frontend and backend. Deployment is a single shell script — ./deploy.sh and you're live.
Storage: We used an in-memory store with interfaces designed to be DynamoDB-compatible. This kept iteration speed fast during the hackathon while ensuring the architecture is production-ready — swapping to DynamoDB is a configuration change, not an architectural one.
Challenges we ran into
Nova Act session management was tricky. Nova Act sessions are stateful — each session maintains its own browser context, cookies, and page history. Getting parallel execution right meant carefully managing session isolation so that one test's login flow didn't bleed into another test's session. We hit some weird bugs early on where tests were passing individually but failing in parallel because sessions were sharing state. The fix was ensuring every fleet worker creates a completely fresh NovaSession with its own context.
Prompt engineering for test generation took more iterations than expected. Our first attempts at using Nova 2 Lite for test generation produced test cases that were either too vague ("test the page works") or too specific ("click the third blue button from the left in the navigation bar"). We spent a good chunk of time tuning the prompt to generate tests that describe intent rather than implementation details — "verify the user can log in" rather than "click the input field with placeholder text 'email'."
WebSocket reliability under concurrent load. When running fleet tests with 4+ parallel sessions, the WebSocket server was initially dropping messages because updates were firing faster than the frontend could consume them. We solved this by implementing a message queue with batched updates — accumulating status changes over short intervals and flushing them as consolidated payloads.
Assertion evaluation wasn't straightforward. We initially tried using Nova Act's act() method for assertions, but that's designed for actions, not evaluations. Switching to act_get() with a BOOL_SCHEMA Pydantic model gave us clean true/false assertion results, but we had to experiment with how to phrase assertions as questions that Nova Act could reliably evaluate. Framing them as "Is [condition] true on the current page?" worked much better than "Check if [condition]."
Time pressure on the frontend. We wanted the dashboard to feel polished — not like a hackathon prototype. That meant investing real time in the UI/UX: the step-by-step execution viewer, the side-by-side visual regression comparison, the network log inspector. There were moments where we had to make hard calls about which features to polish and which to leave functional-but-rough.
Accomplishments that we're proud of
It actually works end-to-end. This sounds basic, but the full loop — write a test in English, click Run, watch Nova Act drive a real browser, see screenshots stream into the dashboard in real-time, get a pass/fail result with an audit trail — works reliably. That's a chain with a lot of moving parts (Next.js → FastAPI → Nova Act SDK → Chromium → WebSocket → React), and getting it solid in a hackathon timeframe feels like a genuine accomplishment.
14 features shipped. We didn't just build a proof of concept — we built a platform. Natural language test authoring, AI test generation, parallel fleet execution, live WebSocket streaming, video recording, natural language assertions, visual regression detection, step-by-step screenshots, a full reporting dashboard, webhook/CI triggers, JSON and HTML export, project settings, test suite organization, and a network log inspector. Each one works. Each one adds real value.
Non-technical people can actually use it. We had a product manager on our extended team try NovaQA cold — no instructions beyond "write what you want to test." They wrote three test cases in ten minutes and ran them successfully. That moment validated the entire thesis of the project.
The AI test generation is genuinely useful. Point it at a URL, click a button, and you get real, executable test cases. Not boilerplate. Not placeholder text. Actual tests that run against the actual page and produce meaningful results. This was the feature where Nova 2 Lite really surprised us with how well it understands page structure and user intent.
Production-ready architecture. The in-memory store is backed by interfaces that map directly to DynamoDB table operations. The API follows REST conventions with proper error handling. The Docker Compose setup handles both development and production. The Caddy reverse proxy handles routing cleanly. This isn't throwaway hackathon code — it's a foundation you could actually build a product on.
It's fully open-source. Everything — backend, frontend, deployment scripts, documentation — is MIT licensed. Anyone can clone, deploy, and start testing in under 60 seconds with docker compose up.
What we learned
Nova Act is more capable than we initially expected. We went in thinking it would handle simple clicks and form fills, but it reliably handles complex multi-step interactions — navigating dropdown menus, handling modals, scrolling to find elements, even interacting with dynamically loaded content. The semantic understanding of page context is impressive. When you say "click the sign in button," it doesn't just pattern-match on the text "sign in" — it understands the role of the element on the page.
Natural language is a better abstraction layer than code for testing. This sounds obvious in hindsight, but experiencing it firsthand was eye-opening. Tests written in English are more readable, more maintainable, and more accessible than their coded equivalents. A test that says "Add a product to the cart, go to checkout, and verify the total is correct" communicates its intent instantly to every person on the team. The equivalent Playwright test takes a minute to parse even for developers.
Prompt engineering is a real engineering discipline. Getting consistent, high-quality outputs from Nova 2 Lite for test generation required systematic experimentation — adjusting temperature, tweaking prompt structure, adding examples, refining output schemas. It's not guesswork; it's iterative engineering with measurable results.
WebSocket architecture needs to be designed for bursty traffic. Real-time test execution generates highly variable message rates — nothing for seconds, then a burst of updates as steps complete rapidly. Standard WebSocket patterns that work for chat applications aren't sufficient. Batching and backpressure mechanisms are essential.
The gap between "working prototype" and "usable tool" is mostly UX. The core Nova Act integration was working within the first day. The remaining time went into making it feel right — smooth transitions, informative loading states, intuitive navigation, contextual actions. The features that make NovaQA feel like a real product rather than a demo are all in the frontend polish.
What's next for NovaQA: AI-Powered QA Testing Platform
NovaQA is a starting point, and we've got a clear roadmap for where it goes from here.
Persistent Storage with Amazon DynamoDB. The in-memory store served us well during the hackathon, but production deployments need durable storage. The interfaces are already DynamoDB-shaped — this is a matter of plugging in the real implementation.
Scheduled Test Runs. Integration with Amazon CloudWatch Events to run test suites on a schedule — hourly smoke tests, nightly regression suites, pre-deployment verification. Continuous monitoring without manual triggers.
Slack and Teams Notifications. Instant alerts when tests fail. A product manager gets a Slack message saying "Login flow test failed — screenshot attached" with a link to the full run report. No more checking dashboards manually.
Test Impact Analysis. Automatically determine which tests need to re-run based on code changes. If a PR only touches the checkout page, there's no reason to run the login tests. This cuts CI time dramatically for large test suites.
Multi-Browser Support. As Nova Act expands its browser capabilities, NovaQA will support cross-browser testing. Write once, run on Chrome, Firefox, Safari, and Edge.
Collaborative Test Authoring. Real-time multi-user editing with live cursors — think Google Docs for test cases. The QA lead and the product manager can co-author a test suite simultaneously.
Natural Language Test Reports. AI-generated summaries of test runs written for non-technical stakeholders. Instead of a table of pass/fail results, a report that says: "All critical checkout flows passed. The user profile page has a regression — the avatar upload button is no longer clickable after the latest deployment."
Community Template Library. Pre-built test suites for common patterns — authentication flows, e-commerce workflows, form validation, CRUD operations — that teams can import and customize. Reducing the cold-start problem to near zero.
We believe testing described by humans and executed by AI is the future of quality assurance. NovaQA is our first step toward making that future real — and we're just getting started.
Built With
- amazon
- amazon-nova-2-lite
- amazon-web-services
- docker
- fastapi
- next.js
- nova-act
- react
- tailwind-css


Log in or sign up for Devpost to join the conversation.