Inspiration

Supply chain attacks cost $60B annually. When a critical CVE drops, the industry standard is 60 days to remediate. Snyk scans and files a ticket. Dependabot opens a PR for one fix. Nobody tests multiple remediation strategies in parallel, cancels false positives mid-flight, or ships with cryptographic provenance. We asked: what if the response wasn't sequential — what if it was speculative?

What it does

Phalanx is an autonomous agent fleet that detects CVEs from the open web, forks your dependency state into N parallel remediation hypotheses via Ghost zero-copy forking, coordinates agents via Redis Streams with Pub/Sub cancellation, validates each hypothesis in an isolated InsForge backend, enforces per-agent permission scopes through a WunderGraph federated supergraph, and ships the winning fix with a Chainguard-signed SBOM and Sigstore attestation — published as cryptographic evidence to cited.md.

Paste a GitHub repo URL into the dashboard. Watch it work.

How we built it

Eight sponsor tools, each architecturally load-bearing — removing any one breaks the product:

  • WunderGraph Cosmo: Federated supergraph with per-tool OAuth scopes via @requiresScopes. Analyst agents get read-only access; only the Rollout Operator holds write:production. MCP Gateway exposes 5 persisted operations as agent-callable tools.
  • TinyFish: Real browser automation against live npm/PyPI pages to find patched versions, enrich CVEs with PoC exploits from GitHub, and create remediation PRs. 89.9% Mind2Web accuracy on multi-step vendor portal flows.
  • Ghost: Zero-copy database forking in ~500ms. Each remediation hypothesis gets its own writable fork. Memory Engine (pgvector + BM25 + ltree) matches incoming CVEs against historical remediation playbooks.
  • Guild.ai: 5 published agents (Scanner, Analyst, Planner, Validator, Operator) with sandboxed execution, credential injection, and immutable audit logs. The Operator uses multi-turn mode with a human-in-the-loop approval gate.
  • Redis: Streams with consumer groups for exactly-once task distribution. Pub/Sub for sub-ms false-positive cancellation across the entire pipeline. Vector Sets (Redis 8) for CVE semantic similarity. Semantic cache at 70% hit rate.
  • Chainguard: Triple role — remediation target (zero-CVE images replace compromised bases), agent runtime (SLSA L3 containers), and DFC auto-conversion of legacy Dockerfiles.
  • InsForge: Per-hypothesis isolated staging backends provisioned via MCP in under 2 minutes. Each fork gets its own Postgres + auth + storage + edge functions.
  • Nexla Express: Bidirectional pipelines — ingests CVE feeds from NVD, GHSA, OSV; writes remediation reports back to customer systems.

Payment rails: x402 micropayments on Base Sepolia for external PoC verification via agentic.market. Evidence published to cited.md via Senso.

The dashboard streams 52+ typed events over SSE — every sponsor tool's activity is visible in real time.

Challenges we ran into

Ghost fork timeouts under load when 16+ stale forks queued. Solved by cleanup automation and bumping timeouts to 30s. TinyFish rate limits during parallel enrichment runs. The @tiny-fish/sdk exports field broke Turbopack's CJS resolver — shipped a pnpm patch. WunderGraph's wgc mcp turned out to be a control-plane server, not a persisted-operations gateway — we built a custom MCP server wrapping the router instead.

Accomplishments that we're proud of

The parallel-speculative pattern is genuinely novel. No existing tool forks state N ways, validates in parallel live backends, and cancels mid-flight on false positives. Compound reliability becomes a strength: 4 forks at 70% each = 99.2% at least one succeeds. The WunderGraph scope denial — an Analyst agent blocked from calling production deploy — fires live in every scan. Real evidence published to a real cited.md URL with real Sigstore signatures.

What we learned

Agent governance isn't optional — it's the product. Enterprise buyers don't care how fast your agent patches if they can't audit every decision. Guild's immutable audit log and WunderGraph's per-operation scopes turned out to be the selling points, not the agent's speed. Also: Ghost's zero-copy forking is an underappreciated primitive. The ability to explore N hypotheses at zero marginal storage cost changes how you architect agent decision-making.

What's next for Phalanx

Production multi-service deployment on Railway (Cosmo Router + Chainguard verification containers) with the dashboard on a Node host. Guild agent sessions triggered via HTTP API for fully remote orchestration. CDP wallet with funded Base Sepolia USDC for live on-chain payments. Continuous monitoring mode alongside the on-demand audit. Multi-customer tenant isolation via InsForge per-customer backends.

Built With

  • chainguard
  • coinbase-cdp
  • ghost
  • guild.ai
  • insforge
  • nexla-express
  • next.js
  • pgvector
  • postgresql
  • redis
  • senso
  • shadcn/ui
  • sigstore
  • tailwind-css
  • tinyfish
  • typescript
  • wundergraph-cosmo
  • x402
Share this project:

Updates