Inspiration

Honestly, it started from frustration. We've all been there — you're heads-down shipping a feature, and the last thing anyone has time for is writing 50 Postman tests or manually probing every API endpoint for security holes. QA always feels like the thing that gets skipped when deadlines hit.

So we thought: what if we just didn't have to do it manually at all?

What We Built

Deep Agents is an autonomous API testing pipeline. You drop in a GitHub URL, and it takes over from there — it figures out your API, writes its own testing tools, and sends a swarm of AI agents to break your code before users do.

No config files. No test scripts. Just a URL.

How We Built It

We broke the pipeline into stages that mirror what a real QA engineer would do:

1. Spec Inference Most repos don't have API docs. So we use Google Gemini 2.5 Flash to read the actual routing code and infer what the API does. It's surprisingly good at this.

2. MCP Server Generation Gemini then writes a Python Model Context Protocol (MCP) server on the fly — basically a remote control that turns the API endpoints into tools our AI agents can natively call.

3. The Agent Swarm Three agents run in parallel, each playing a different role:

  • 🟢 Happy Path Agent — does the normal stuff work?
  • 🟡 Edge Case Hunter — nulls, SQL injections, garbage inputs
  • 🔴 Security Probe Agent — tries to sneak through with forged or blank Auth0 JWT tokens

4. AI Reasoning Loop Raw HTTP responses go back to Gemini, which decides whether something is actually a bug. A 401 rejecting a fake token? That's ✅ secure. A 200 OK on a route that should be locked down? That's a 🚨 critical vulnerability.

5. Aerospike Memory Every bug and test run is saved to a local Aerospike database. Next time we test the same repo, the agents already know what broke before and specifically hunt for regressions.

Challenges

Getting the AI to not flag false positives was genuinely hard. Teaching it that "the API correctly rejected my fake token" is a pass — not a failure — required a lot of careful prompt design.

Building reliable spec inference for repos that have zero API documentation was also a real challenge. Gemini handles Express and FastAPI really well, but every codebase structures its routes differently.

What We Learned

MCP is a genuinely powerful standard for agentic AI. Once you can turn any API into a set of callable tools, orchestrating autonomous agents over it becomes surprisingly clean.

And honestly? Gemini is really good at reading messy, undocumented code and making sense of it. That was the part that surprised us the most.

Built With

Share this project:

Updates