Inspiration 🚨

If you’ve ever pushed code to production at 4 PM on a Friday and immediately heard your Slack notifications explode, you know exactly what the QA bottleneck feels like.

In a recent project, our team spent weeks building a massive new feature. The Product Manager handed us a flawless 20-page Product Requirement Document (PRD), we wrote the code, and handed it off to QA. Then came the nightmare. QA engineers spent agonizing days translating the English PRD into hundreds of lines of brittle Playwright and Selenium code. When the code finally shipped, a frontend developer changed a single CSS class name. Every single automated test failed. Production was blocked.

We realized the current state of QA automation is fundamentally broken. Humans shouldn't be manually translating PRD text into fragile CSS selectors. We wanted to build a system where the PRD itself could autonomously drive the browser.

What it does πŸ€–

NovaFlow is an autonomous AI testing ecosystem that converts unstructured Product Requirement Documents directly into executable browser automation, completely eliminating the need for engineers to manually write UI test scripts.

It bridges the gap between Product Managers and QA Engineers by operating as a multi-agent system:

  1. Intelligent Extraction: You upload a raw PDF or Word document containing your product specs. NovaFlow processes the text, extracts the business logic, flags missing edge cases, and instantly generates a structured array of atomic test cases.
  2. Autonomous Execution: NovaFlow physically launches a Chromium browser and visually navigates the UI to execute the tests without relying on brittle CSS selectors. If a button moves or changes color, the agent dynamically adapts.
  3. Conversational Insights: Instead of scrolling through endless failed execution logs, developers can use their microphone to have a bidirectional voice-chat with the testing pipeline to ask exactly why a test failed.

How we built it πŸ› οΈ

We built NovaFlow entirely from scratch using a monolithic Python FastAPI backend and a React/Vite dynamic frontend. We heavily utilized three powerful Amazon Nova models to power our multi-agent architecture:

System Architecture Breakdown

                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚        QA Developer / User          β”‚
                        β”‚    (Uploads PRD & Views Results)    β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚     React / Vite Dashboard (UI)     β”‚
                        β”‚   (Live Logs & Audio Dashboard)     β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β–Ό [REST API]
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                        FastAPI Backend Orchestrator                           β”‚
  β”‚                                                                               β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚ Phase 1 & 2   β”‚     β”‚ Phase 3 Validation    β”‚     β”‚ Nova Sonic Router   β”‚  β”‚
  β”‚  β”‚ (Parse & Gen) β”‚ ──► β”‚ (Action Mapping)      β”‚ ──► β”‚ (Audio Streaming)   β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β–Ό                        β–Ό                              β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                         Amazon Bedrock Intelligence                           β”‚
  β”‚                                                                               β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚ Nova 2 Lite   β”‚     β”‚ Nova Act              β”‚     β”‚ Nova Sonic          β”‚  β”‚
  β”‚  β”‚ (Logic rules) β”‚     β”‚ (Visual DOM parsing)  β”‚     β”‚ (Voice execution)   β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Autonomous Execution Engine       β”‚
                        β”‚  (Playwright + Chromium Browser)    β”‚
                        β”‚  Executes Tests & Captures DOM      β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  • Amazon Nova 2 Lite: Used as the 'Brain' of the system. We leveraged its incredible speed and reasoning capabilities to ingest massive PRD documents, perform semantic extraction, and generate logical test arrays.
  • Amazon Nova Act: Used as the 'Hands' of the system. We integrated Nova Act alongside Playwright to visually interpret the DOM. Instead of hardcoding click coordinates, Nova Act dynamically reasons through the UI to complete the tests.
  • Amazon Nova Sonic: Used as the 'Voice' of the system. We implemented real-time binary audio streaming to allow developers to verbally interrogate their execution reports.

Challenges we ran into πŸ§—β€β™‚οΈ

  1. The Hallucination Problem: Autonomous browser agents often hallucinate in complex or unknown UI environments. To solve this, we built a hybrid Failsafe Architecture. If a developer explicitly provides a CSS selector in their PRD, NovaFlow bypasses the AI reasoning entirely and uses lightning-fast native Playwright commands. If they use vague English, it falls back to Nova Act's visual reasoning.
  2. Environment Portability: AI agent frameworks are notoriously difficult to set up locally. We spent days stripping out hardcoded paths and replacing them with OS-agnostic relative resolutions, creating dual start.sh and start.bat scripts so that judges and users on Mac, Linux, or Windows can spin up the entire ecosystem in under 3 minutes.
  3. Nova Sonic Audio Streaming: Dealing with binary audio streaming arrays in Python to pass voice data seamlessly back and forth to the React frontend required complex base64 encoding and asynchronous event-loop management.

Accomplishments that we're proud of πŸ†

We are incredibly proud of building a system that successfully orchestrates three different Amazon Nova models simultaneously in a single, cohesive user journey. Transitioning seamlessly from text-processing (Lite) to visual UI execution (Act) to conversational audio (Sonic) feels like looking directly into the future of software development.

What we learned 🧠

We learned that the true power of GenAI isn't just generating boilerplate codeβ€”it's orchestrating agents that can actively reason and act on production environments. We also learned how incredibly capable the Amazon Nova model family is, particularly Nova Act's ability to interpret complex web DOMs without explicit training.

What's next for NovaFlow πŸš€

For NovaFlow to become a staple in enterprise engineering, it needs to live where developers live. Our next step is to package NovaFlow as a GitHub Action. We envision a future where every time a developer opens a Pull Request, NovaFlow autonomously reads the linked Jira ticket, spins up an ephemeral environment, visually tests the new UI against the PRD, and leaves a Nova Sonic voice memo for the developer if their code broke the build.

Built With

  • amazon-bedrock
  • amazon-nova-act
  • amazon-nova-lite
  • amazon-nova-sonic
  • fastapi
  • python
  • react
Share this project:

Updates

posted an update

Question: Really interesting direction One thing I m curious about how does Novaflow handle ambiguous or incomplete requirements in PRDs In real world scenarios, specs are often messy, and test accuracy heavily depends on interpretation. It would be great to see how your agents deal with uncertainty or conflicting logic. Also do you have any validation layer to ensure generated tests are actually aligned with business intent and not just syntactically correct Also potential area to explore could be traceability mapping each generated test case back to specific requirement lines. This would make debugging, audits, and team collaboration much easier, especially in large systems.

Answer: Hey, that’s a really good question. In real-world scenarios PRDs are often messy or incomplete, and test quality heavily depends on how well those requirements are interpreted.

First, I believe AI is not magic β€” it’s mathematics (statistics and probability). So if the input data (in this case the PRD) is inaccurate or ambiguous, the output can also be inaccurate. Because of that, NovaFlow follows a human-in-the-loop approach instead of fully blind automation.

For handling ambiguous or incomplete PRDs, NovaFlow first uses the reasoning capability of the Nova Lite model with the system persona set as a Senior Product Manager. Instead of directly generating test cases, the system first focuses on understanding the PRD and produces three outputs:

  1. A summarized version of the PRD
  2. Extracted product features and requirements
  3. A review section for the Product Manager

Here the actual Product Manager (who wrote the PRD after discussing with the client) can review, correct, or refine the extracted requirements. This ensures the interpretation is aligned with the real product intent before moving forward.

Regarding uncertainty or conflicting logic, NovaFlow does not directly generate test cases. It first identifies gaps, inconsistencies, or conflicting logic in the PRD and raises them for clarification with the Product Manager. Only after these ambiguities are resolved does the system move forward with test generation.

For the validation layer, Phase 2 introduces another persona: a Senior QA Engineer. This agent understands the validated PRD and generates test cases that focus on user journeys, business logic validation, and proper test design strategy β€” not just syntactically correct tests.

For traceability, I’m still exploring deeper LLM tracing, but currently during execution NovaFlow generates reports that store reasoning traces from Nova Act, pass/fail status, and execution metrics like duration. This provides visibility for Product Managers, QA Engineers, and Developers to understand how and why a test behaved in a certain way.

Additionally, I integrated Nova Sonic which enables bidirectional voice interaction. This allows stakeholders to discuss the test execution report conversationally, while the responses remain grounded in the actual execution data.

Overall, the idea behind NovaFlow is not to replace human decision making but to augment the QA workflow with structured AI reasoning while keeping humans in the loop.

Log in or sign up for Devpost to join the conversation.

posted an update

NovaFlow has been updated to use Amazon Nova 2 Sonic (amazon.nova-2-sonic-v1:0) following AWS’s announcement that Nova Sonic 1.0 will be deprecated by September 14, 2026. The migration was straightforward since Nova 2 Sonic uses the same streaming API, so existing voice interaction functionality continues to work seamlessly.

Log in or sign up for Devpost to join the conversation.