https://www.loom.com/share/9fbc2d64cb0b465b9115ec7043bf7c9a Video demo link wouldnt let me use Loom
Inspiration We've all shipped broken features because manual QA doesn't scale and Selenium tests break constantly. Traditional tools give cryptic errors like "element not found" with zero context. i realized AI agents could test apps like humans do and show visual proof of exactly what happened.
What it does
QA Agent automatically tests web applications using AI - you paste a GitHub URL, describe tests in plain English ("test the login flow"), and it runs them in isolated Daytona workspaces while capturing screenshots at every step plus full video. When tests fail, Claude analyzes the visual evidence and provides specific recommendations on why it broke and how to fix it. It's QA testing that gives you visual proof and actionable insights, not just pass/fail.
How i built it
Backend uses Bun + Elysia with Daytona SDK for workspace orchestration and Inngest for durable workflows that clone repos, start apps, and run browser-use (Python AI agent) for testing. Frontend is Next.js 15 with TanStack Query for real-time polling, displaying test timelines with screenshots and video. Claude API analyzes failures and PostgreSQL stores all test results with visual evidence.
Challenges i ran into
Getting browser-use and Chromium running inside Daytona containers was harder than expected - i had to figure out Python dependencies, headless browser config, and video recording orchestration remotely. Providing real-time status updates from background Inngest jobs to the frontend required careful coordination of database updates and polling strategies. Dynamically generating Python scripts that reliably capture screenshots and return structured results took multiple iterations.
Accomplishments that we're proud of
SDK (workspace creation, command execution, file management). i built a novel hybrid TypeScript/Python architecture that combines natural language tests + AI execution + visual evidence + AI failure analysis. The result is demo-ready, and actually solves a real problem developers face every day.
What i learned
Deploying AI agents in containerized environments like Daytona taught us about dependency management, headless browser configuration, and remote execution patterns for long-running processes. Visual evidence (screenshots + video) is essential - when tests fail, seeing exactly what the AI saw makes debugging 10x faster than cryptic error messages. Hybrid architectures unlock powerful capabilities by using the best tool for each job (TypeScript for APIs, Python for AI testing).
Built With
- bun
- cluade
- daytona
- ingest
- next.js-15
- react-19
- sql
- tanstack
Log in or sign up for Devpost to join the conversation.