Inspiration

In 2025, AI writes more code than humans. GitHub Copilot, Cursor, Claude — they're not just assistants anymore, they're the primary authors of production code. But here's the terrifying truth: AI doesn't know if its code actually works.

It generates. It compiles. It looks correct. But does the login button work? Does the error message appear? Does the payment go through? The AI has no idea. It's building blindfolded.

This creates a paradox: we're generating code 10x faster, but we can't test it 10x faster. QA becomes the ultimate bottleneck. The faster AI writes, the bigger the pile of untested code grows. And manual testing? It's already dead. No human can keep up with AI-generated code velocity.

We need AI that tests AI. That's Axolotl.

Named after the remarkable salamander that can regenerate entire limbs, Axolotl embodies the same philosophy: self-healing, self-verifying, continuously adapting software.


What it does

Axolotl is an autonomous AI QA agent that lives inside VS Code. It doesn't mock, stub, or simulate — it opens a real browser, sees with real eyes, and tests like a real user.

When AI writes a login page, Axolotl launches Chrome, types an email, clicks the button, and watches what happens. Screenshots captured. Console logs recorded. Evidence collected.

The 7-Phase QA Pipeline

  1. Detect Changes — Reads your git diff / workspace to understand what changed
  2. Analyze Code — AST-based code structure analysis across 15+ languages using Tree-Sitter
  3. Web Search — Searches best practices and known issues via You.com API
  4. Generate Plan — AI generates comprehensive test cases with Mermaid diagram visualizations
  5. Inject Logging — Auto-injects SENTINEL_TEST_LOG markers for behavioral evidence tracking
  6. Execute Tests — Launches real browsers via Puppeteer, runs commands, validates end-to-end flows
  7. Report & Fix — Generates evidence-based verdict (MERGEABLE / NOT_MERGEABLE) and offers to fix all issues

Key Innovation

Instead of just checking UI success, Axolotl proves the right code paths executed by analyzing logs, terminal output, and browser behavior together — true evidence-based testing. Think of it as having a senior QA engineer who never gets tired, validates every function, gives you actionable reports in seconds, and helps you fix all issues.


How we built it

Architecture

Axolotl is built as a VS Code extension with a React-based webview UI. The core engine orchestrates multiple AI models and tools through a sophisticated task pipeline.

  • Extension Core: TypeScript + Node.js, communicating via gRPC and Protocol Buffers
  • Frontend: Built with Lovable — React 18 + Vite + Tailwind CSS + Radix UI + Framer Motion for a polished, modern developer experience
  • Browser Automation: Puppeteer Core + Chrome Launcher for real browser-based E2E testing
  • Code Analysis: Tree-Sitter WASM parsers for AST analysis across 15+ programming languages
  • Deployment: Hosted and deployed on Render for reliable, scalable infrastructure

Sponsor Tools Integration

Sponsor Tool How We Used It
You.com API Powers Axolotl's web search tool — during QA analysis, it queries You.com's Express Agent API to find best practices, known bugs, and documentation relevant to the code under test. This contextual awareness makes test generation significantly smarter.
Google Gemini API The primary AI backbone for code analysis, test plan generation, browser action orchestration, and intelligent decision-making. We leverage Gemini 2.5 Flash's native tool-calling and reasoning capabilities to drive the entire QA pipeline autonomously.
Render Our deployment platform — Render hosts the backend services and API endpoints that power Axolotl's cloud features, providing zero-downtime deploys and automatic scaling for concurrent QA sessions.
Lovable Used to rapidly build and iterate on the frontend webview UI. Lovable's AI-assisted development allowed us to create a polished, production-quality React interface with smooth animations and intuitive UX in record time.

Multi-Model Intelligence

Axolotl supports 40+ LLM providers but strategically uses different models for different phases:

  • Gemini 2.5 for fast code analysis and test orchestration
  • Anthropic Claude for deep reasoning during complex test plan generation
  • You.com for real-time web intelligence during QA research

Challenges we ran into

  1. Browser Automation Reliability — Getting Puppeteer to reliably interact with dynamically-rendered SPAs was challenging. We had to build adaptive waiting strategies and intelligent element detection that doesn't break when CSS classes or layouts change.

  2. Evidence Correlation — Correlating console logs, network requests, screenshots, and terminal output into a coherent "proof" that a code path executed correctly required building a custom evidence pipeline.

  3. Test Plan Intelligence — Generating meaningful test plans (not just random clicks) required deep code understanding. We solved this with AST-based analysis combined with You.com API searches for contextual best practices.

  4. Real-time UI Feedback — Streaming the QA progress (browser screenshots, test results, log analysis) back to the VS Code webview in real-time required careful gRPC stream management and React state coordination.

  5. Multi-Model Orchestration — Different AI models have different strengths. Routing the right task to the right model (Gemini for speed, Claude for depth, You.com for search) while maintaining conversation context was a significant engineering challenge.


Accomplishments that we're proud of

  • End-to-end AI loop: AI writes code → AI tests code → AI fixes code → Human approves. The complete autonomous development loop.
  • Real browser testing: Not mocking, not stubbing — real Chrome, real clicks, real screenshots, real evidence.
  • Evidence-based verdicts: Every MERGEABLE/NOT_MERGEABLE decision comes with proof — screenshots, logs, and execution traces.
  • 15+ language support: Tree-Sitter AST parsing works across JavaScript, TypeScript, Python, Rust, Go, Java, C/C++, Swift, Kotlin, and more.
  • Sub-2-minute QA cycles: What takes a human QA engineer 30 minutes, Axolotl completes in under 2 minutes with higher coverage.
  • Self-healing: When tests fail, Axolotl doesn't just report — it offers to fix the code and re-verify.

What we learned

  • AI testing AI is not just possible — it's necessary. The velocity of AI-generated code demands AI-speed verification.
  • Evidence beats assertions. Traditional test frameworks say "pass/fail." Axolotl says "here's the screenshot, here's the log, here's what the code actually did."
  • Multi-model architectures are the future. No single AI model is best at everything. Orchestrating specialized models (Gemini for speed, Claude for reasoning, You.com for search) produces dramatically better results.
  • The developer experience matters. An AI QA tool that's hard to use won't get used. Building a beautiful, intuitive UI with Lovable and deploying reliably on Render was essential to adoption.

What's next for Axolotl

  • CI/CD Integration — Run Axolotl as a GitHub Action on every PR automatically
  • API Testing — Extend beyond UI to validate REST/GraphQL endpoints end-to-end
  • Visual Regression — AI-powered screenshot comparison to detect unintended UI changes
  • Team Dashboard — Centralized QA metrics and merge confidence scores across the org
  • Voice-Driven QA — Natural language voice commands to trigger and control QA sessions hands-free

Built With

gemini · you-com · render · lovable · typescript · react · vscode · puppeteer · tree-sitter · grpc · tailwindcss · anthropic · node-js

Built With

Share this project:

Updates