Couch

Our chatbot interface

Inspiration

Imagine if testing your website was as simple as pasting a URL. Our platform uses Claude AI and Stagehand to automatically explore your site, generate user flows (eg. "Login → Checkout" or "Signup → Browse"), and test them end-to-end with no manual scripting needed. It identifies bugs in real-time, flags broken steps with full context and screenshots, and gives you a clean dashboard to monitor, debug, and re-run tests instantly. Think of it as an AI-powered QA engineer that never sleeps.

What it does

Our platform uses Claude AI and Stagehand to automatically explore your site, generate user flows (eg. "Login → Checkout" or "Signup → Browse"), and test them end-to-end with no manual scripting needed. It identifies bugs in real-time, flags broken steps with full context and screenshots, and gives you a clean dashboard to monitor, debug, and re-run tests instantly.

It works like this:

Generating test steps: Each flow is broken down into atomic actions (clicks, form entries, validations), using the Stagehand and Playwright framework.

Executing flows: Stagehand performs actions (act()) and validations (extract()/observe()), logging each step with success/failure status and context.

Autonomous Error detection: Errors (like missing elements or failed assertions) trigger alerts, and an agent autonomously creates a Github issue on the website's repository indicating the incorrect user flow.

Continuous feedback: Developers can re-run flows, annotate failures (false alarm or real bug), and iterate instantly.

Accurate Summarization: We utilize AI to summarize the results of the automatic testing framework in a sleek, interactive chatbot interface.

How we built it

Frontend: Next.js + Tailwind UI + ShadCN to render flows, logs, screenshots, and error history.

Stagehand Integration: We used Stagehand’s page.goto, act, observe, and extract APIs for page control and assertions.

LLM Integration: We utilized Anthropic's API to create a function that is called when Stagehand detects failed tests in the DOM - this trigger the LLM to create a title and description for a Github issue and create a Github issue on the repo. We also utilized the Gemini API for a summarization of the battle testing.

Challenges we ran into

One big challenge was trying to figure out which framework to use to integrate multiple AI agents. We got over this hump by utilizing Langchain and Langgraph's support for autonomous agents.

Accomplishments that we're proud of

We are proud of being able to build a feature that autonomously created Github issues, as that was one main functionality we wanted to integrate.

What we learned

We learned a lot about AI agents as well as different frameworks for integrating multiple agents into software applications. We also learned a lot about browser automation and testing.

What's next for Couch

We plan on incorporating a fully agentic system that can read and understand HTML DOM, and identify potential issues within the code. We also hope to allow AI to make decisions about creating Github issues/PRs, as well as new frontend designs.

Srikar Eranky Hacker ID: 98A