Inspiration
In 2026, AI became the primary author of code. GitHub Copilot, Cursor, Claude — developers shifted from writing code to reviewing it. But there's a fatal gap: AI doesn't know if its code actually works. It generates, it compiles, it looks correct — but does the login button work? Does the error message appear? Does the payment go through?
We're generating code 10x faster, but we can't test it 10x faster. QA becomes the ultimate bottleneck. Manual testing is dead — no human can keep up with AI-generated code velocity. Existing test automation tools like Selenium or Playwright still require a human to write every test script. Even newer AI testing tools need you to tell the AI what to test and how to test it.
We asked: What if the AI could look at the code, figure out what needs testing, and just... do it?
That's Axolotl — an autonomous AI QA agent that closes the loop. AI writes code. AI tests code. Humans approve.
What it does
Axolotl is a VS Code extension that acts as an autonomous QA engineer. Unlike every other testing tool on the market, Axolotl requires zero human test instructions. Here's what makes it different:
Reads your code with real understanding — Uses tree-sitter AST parsing across 17+ languages to deeply analyze functions, classes, error handlers, API calls, and validation logic. It doesn't just look at diffs — it understands code structure.
Researches best practices — Queries the You.com API to find testing best practices relevant to your specific tech stack and patterns, enriching test case generation with real-world knowledge.
Generates intelligent test plans — Automatically creates 5-10 targeted test cases covering functional flows, edge cases, error handling, and UI/UX — complete with Mermaid flowchart visualizations. All without a single line of human instruction.
Opens a real browser and tests like a real user — Launches Chrome via Puppeteer, types into fields, clicks buttons, scrolls pages, and captures screenshots at every step. This is not mocking or simulation — it's real E2E testing at machine speed.
Collects forensic evidence — Injects strategic console log markers into code, monitors browser output, captures screenshots, and correlates UI behavior with code execution paths. Every test result is backed by evidence.
Delivers a merge verdict — Generates a structured report:
MERGEABLE,NOT_MERGEABLE, orMERGEABLE_WITH_RISKS. No more guessing if a PR is safe.Remembers your project — A persistent memory system (
axolotl.md) learns your project's setup — dependency installation, dev server commands, environment variables, quirks — making each subsequent session faster and smarter.Offers to fix what it finds — When tests fail, Axolotl can automatically patch the code and re-run verification. Write → Test → Fix → Verify, all AI.
How we built it
Axolotl is built on top of Cline, the open-source autonomous coding agent for VS Code. We forked Cline's powerful agent architecture and heavily modified it to create a specialized QA workflow engine.
What we kept from Cline:
- The autonomous task execution engine and multi-model AI support (Claude, GPT-4, Gemini, and 10+ providers)
- Browser automation infrastructure (Puppeteer integration)
- File manipulation and terminal execution capabilities
- MCP (Model Context Protocol) integration for extensibility
What we built on top:
- A 9-phase QA state machine that orchestrates the entire testing workflow: Detect Changes → Analyze Code → Web Search → Generate Plan → Inject Logs → Execute Tests → Cleanup → Report → Update Memory
- 5 custom QA tools (
axolotl_detect_changes,axolotl_analyze_code,axolotl_web_search,axolotl_generate_plan,axolotl_qa_report) that give the AI structured capabilities for each QA phase - A tree-sitter-powered code analysis engine that extracts AST-level understanding across 17+ programming languages
- A You.com API integration for real-time web search to inform test generation with current best practices
- An evidence collection pipeline that correlates injected log markers with browser screenshots and console output
- A persistent memory system that learns project-specific setup across sessions
- A custom React + Tailwind webview UI for test plan visualization, report display, and interactive configuration
The entire extension is built with TypeScript, bundled with esbuild, and tested with Mocha (unit), VS Code Test CLI (integration), and Playwright (E2E).
Challenges we ran into
Making AI truly autonomous in testing was the hardest part. Existing AI testing tools fall into two camps: (1) tools that generate test scripts for humans to run, or (2) chatbots that need you to describe what to test. Neither is autonomous.
We had to solve several unique challenges:
- Context explosion: AST analysis of large PRs can produce massive amounts of data. We implemented careful limits (20 files max, 30 definitions per file, 50 search results) to keep AI context manageable without losing critical information.
- Evidence reliability: How do you prove a test passed or failed? We developed an injected log marker system (
AXOLOTL_TEST_LOG) that creates a forensic trail linking code execution to UI behavior, then ensures clean removal before report generation. - Project bootstrapping: Every project has different setup steps. The memory system had to be designed to learn organically during test execution — discovering how to install dependencies, start dev servers, and configure environments — then persist that knowledge for future sessions.
- Browser timing: Real browsers are unpredictable. Network latency, animation delays, and async rendering all create race conditions. We had to build robust waiting and verification strategies into the test execution phase.
Accomplishments that we're proud of
- Zero-instruction E2E testing: Axolotl is the first tool we know of that can go from "here's a PR" to "here's a merge verdict with evidence" without any human telling it what to test or how to test it.
- True code understanding: By combining tree-sitter AST parsing with AI reasoning and You.com web search, Axolotl generates test cases that are genuinely intelligent — not just random clicks, but targeted verification of real functionality.
- The evidence pipeline: Every test result is backed by screenshots, console logs, and execution traces. This isn't "trust me, it passed" — it's forensic-grade QA evidence.
- The memory system: Axolotl gets smarter with every session. First run might take a few minutes to learn your project. Second run? It already knows everything.
- Multi-model support: By building on Cline, Axolotl works with Claude, GPT-4, Gemini, and 10+ other AI providers — users aren't locked into a single vendor.
What we learned
- AI testing is fundamentally different from AI coding. Writing code is generative — the AI creates something new. Testing is adversarial — the AI must actively try to break what was created. This required a completely different prompt engineering approach.
- AST analysis is the secret weapon. Without structural code understanding, AI test generation is just guessing. Tree-sitter gave us the precision we needed to generate meaningful test cases.
- Web search dramatically improves test quality. Integrating You.com API to research testing best practices for specific frameworks and patterns made test generation significantly more relevant and thorough.
- Memory changes everything. The difference between a stateless tool and one that remembers your project is massive. First-run friction drops dramatically when the AI already knows how to set up your environment.
- Building on open source is a superpower. Cline's architecture gave us months of head start. Instead of building an agent framework from scratch, we focused entirely on the QA innovation layer.
What's next for Axolotl
- CI/CD integration: Run Axolotl as a GitHub Action that automatically tests every PR before human review
- Historical learning: Aggregate test results across sessions to identify patterns — which code areas break most often? Which types of changes are riskiest?
- Visual regression testing: Compare screenshots across runs to detect unintended UI changes
- API testing: Extend beyond browser E2E to test backend endpoints, database mutations, and service integrations
- Team collaboration: Share memory files and test reports across team members for consistent QA standards
- Performance benchmarking: Measure and track load times, response times, and rendering performance during test execution
Built With
- anthropic-claude-api
- cline
- esbuild
- fastify
- google-gemini-api
- grpc
- javascript
- mcp-(model-context-protocol)
- mocha
- node.js
- openai-api
- playwright
- puppeteer
- react
- sqlite
- tailwind-css
- tree-sitter
- typescript
- vite
- vs-code-extension-api
- you.com-api
Log in or sign up for Devpost to join the conversation.