Inspiration
Supply chain attacks cost $60B annually. When a critical CVE drops, the industry standard is 60 days to remediate. Snyk scans and files a ticket. Dependabot opens a PR for one fix. Nobody tests multiple remediation strategies in parallel, cancels false positives mid-flight, or ships with cryptographic provenance. We asked: what if the response wasn't sequential — what if it was speculative?
What it does
Phalanx is an autonomous agent fleet that detects CVEs from the open web, forks your dependency state into N parallel remediation hypotheses via Ghost zero-copy forking, coordinates agents via Redis Streams with Pub/Sub cancellation, validates each hypothesis in an isolated InsForge backend, enforces per-agent permission scopes through a WunderGraph federated supergraph, and ships the winning fix with a Chainguard-signed SBOM and Sigstore attestation — published as cryptographic evidence to cited.md.
Paste a GitHub repo URL into the dashboard. Watch it work.
How we built it
Eight sponsor tools, each architecturally load-bearing — removing any one breaks the product:
- WunderGraph Cosmo: Federated supergraph with per-tool OAuth scopes via @requiresScopes. Analyst agents get read-only access; only the Rollout Operator holds write:production. MCP Gateway exposes 5 persisted operations as agent-callable tools.
- TinyFish: Real browser automation against live npm/PyPI pages to find patched versions, enrich CVEs with PoC exploits from GitHub, and create remediation PRs. 89.9% Mind2Web accuracy on multi-step vendor portal flows.
- Ghost: Zero-copy database forking in ~500ms. Each remediation hypothesis gets its own writable fork. Memory Engine (pgvector + BM25 + ltree) matches incoming CVEs against historical remediation playbooks.
- Guild.ai: 5 published agents (Scanner, Analyst, Planner, Validator, Operator) with sandboxed execution, credential injection, and immutable audit logs. The Operator uses multi-turn mode with a human-in-the-loop approval gate.
- Redis: Streams with consumer groups for exactly-once task distribution. Pub/Sub for sub-ms false-positive cancellation across the entire pipeline. Vector Sets (Redis 8) for CVE semantic similarity. Semantic cache at 70% hit rate.
- Chainguard: Triple role — remediation target (zero-CVE images replace compromised bases), agent runtime (SLSA L3 containers), and DFC auto-conversion of legacy Dockerfiles.
- InsForge: Per-hypothesis isolated staging backends provisioned via MCP in under 2 minutes. Each fork gets its own Postgres + auth + storage + edge functions.
- Nexla Express: Bidirectional pipelines — ingests CVE feeds from NVD, GHSA, OSV; writes remediation reports back to customer systems.
Payment rails: x402 micropayments on Base Sepolia for external PoC verification via agentic.market. Evidence published to cited.md via Senso.
The dashboard streams 52+ typed events over SSE — every sponsor tool's activity is visible in real time.
Challenges we ran into
Ghost fork timeouts under load when 16+ stale forks queued. Solved by cleanup automation and bumping timeouts to 30s. TinyFish rate limits during parallel enrichment runs. The @tiny-fish/sdk exports field broke Turbopack's CJS resolver — shipped a pnpm patch. WunderGraph's wgc mcp turned out to be a control-plane server, not a persisted-operations gateway — we built a custom MCP server wrapping the router instead.
Accomplishments that we're proud of
The parallel-speculative pattern is genuinely novel. No existing tool forks state N ways, validates in parallel live backends, and cancels mid-flight on false positives. Compound reliability becomes a strength: 4 forks at 70% each = 99.2% at least one succeeds. The WunderGraph scope denial — an Analyst agent blocked from calling production deploy — fires live in every scan. Real evidence published to a real cited.md URL with real Sigstore signatures.
What we learned
Agent governance isn't optional — it's the product. Enterprise buyers don't care how fast your agent patches if they can't audit every decision. Guild's immutable audit log and WunderGraph's per-operation scopes turned out to be the selling points, not the agent's speed. Also: Ghost's zero-copy forking is an underappreciated primitive. The ability to explore N hypotheses at zero marginal storage cost changes how you architect agent decision-making.
What's next for Phalanx
Production multi-service deployment on Railway (Cosmo Router + Chainguard verification containers) with the dashboard on a Node host. Guild agent sessions triggered via HTTP API for fully remote orchestration. CDP wallet with funded Base Sepolia USDC for live on-chain payments. Continuous monitoring mode alongside the on-demand audit. Multi-customer tenant isolation via InsForge per-customer backends.
Built With
- chainguard
- coinbase-cdp
- ghost
- guild.ai
- insforge
- nexla-express
- next.js
- pgvector
- postgresql
- redis
- senso
- shadcn/ui
- sigstore
- tailwind-css
- tinyfish
- typescript
- wundergraph-cosmo
- x402
Log in or sign up for Devpost to join the conversation.