Inspiration
Every engineering team we know has a CVE dashboard that nobody reads. Scanners are loud, triage is manual, and the fix usually turns out to be "bump the base image," which nobody has time to verify. We wanted the scanner to stop producing work and start closing it.
The "Ship to Prod" framing was the nudge: build an agent that takes the action a human keeps postponing, not one that files another ticket about it.
What it does
tracepath is the tool backend behind a Guild.ai agent that autonomously triages and remediates vulnerabilities in a GitHub repo or container image.
Given a target, the agent:
- Runs
osv-scannerover the repo and the image. - Cross-references every image finding against the Chainguard/Wolfi
security.jsonfeed to isolate the subset already fixed upstream, i.e. one base-image swap away from gone. - Uses repo navigation tools (
file,files,list,grep) to locate the vulnerable call sites and decide whether the CVE is actually reachable in this codebase. - Dispatches TinyFish fetchers and browser agents against advisory URLs to pull exploit conditions the JSON feed does not carry.
- Opens a GitHub issue with a remediation plan and the evidence behind it.
The output is a patch proposal with provenance, not another row in a dashboard.
How we built it
- Runtime: Hono on Node, TypeScript,
valibotschemas exposed throughhono-openapiso Guild.ai gets a typed, self-describing tool surface via/openapiand Scalar docs at/docs. - Scanners:
osv-scannerspawned as a child process in two modes,scan sourcefor repos andscan imagefor containers, with defensive JSON parsing. - Chainguard correlation: we fetch
packages.wolfi.dev/os/security.jsononce, index it by normalized package name, and for each OSV finding we uniongroup.ids,group.aliases, andvuln.aliasesto match against Wolfisecfixes. A finding is flaggedfixableiff any alias maps to a Wolfi-fixed version. - Repo tools: a small, LLM-shaped filesystem. Reads are windowed to 1000 lines with offsets, directory listings are flat and one level deep, and
grepcaps at 500 matches with a truncation flag. Every tool output is bounded so the agent never blows its context. - Browser agents: TinyFish SDK, wrapped so
APIStatusErrorandAPIErrorsurface as typed HTTP responses instead of leaking provider internals.
Challenges we ran into
- osv-scanner JSON hygiene. Non-JSON preamble on stdout and non-zero exit codes on successful scans forced us to parse from the first
{and treat exit codes as advisory. - Package identity. A single image package shows up under an ecosystem name and an OS package name; missing either side of the lookup silently drops fixes. We normalize both and dedupe on
(package, fixedVersion). - Bounding LLM context. The first Guild.ai runs happily asked for 40k-line lockfiles. Every tool grew a cap.
- Error surface. Each upstream (osv, Wolfi, TinyFish, filesystem) has its own failure mode; we funneled them into typed error classes so the agent receives an actionable 4xx/5xx instead of a stack trace.
Accomplishments that we're proud of
- An agent loop that goes from "here is a repo" to "here is an issue with a fix" without a human in the middle.
- A Chainguard correlation that turns a wall of red findings into the small, actually actionable subset, on real production images.
- A tool surface designed for an LLM rather than a human: bounded outputs, stable shapes, typed errors, and an OpenAPI document the agent can introspect.
- Three sponsor integrations (Guild.ai, Chainguard, TinyFish) that each pull their weight in the loop, no glue tools.
What we learned
- Tools for agents are not tools for humans. Every endpoint needed a truncation story and a stable shape.
grepwithout a cap crashes the agent;filewithout an offset hides bugs past line 1000. - "Vulnerable" and "fixed" live in different vocabularies. OSV speaks in CVE/GHSA aliases; Wolfi's
secfixessometimes use one, sometimes another, sometimes both. The alias union is the whole trick. - The interesting subset is small. On real images, the fraction of findings that are both Chainguard-fixable and reachable in code is tiny, and that is exactly the subset a human should have been triaging all along.
What's next for tracepath
- Close the loop end-to-end. Move from "open an issue" to "open a PR" with the Dockerfile diff and a green build attached.
- Reachability beyond grep. Plug in a lightweight call-graph step (tree-sitter or LSP) so "is this CVE actually exercised" stops relying on string search.
- Continuous mode. Run on a schedule against a fleet of repos and images, only paging humans when a new finding is reachable and not Chainguard-fixable.
- More remediation sources. Wolfi is the cleanest feed today; adding distroless and the major distro security trackers widens the "already fixed upstream" net.
- Cost and provenance receipts. Every issue tracepath opens should carry a signed manifest of which tools ran, what they returned, and what the agent decided, so reviewers can audit the loop instead of trusting it.
Built With
- chainguard
- guild.ai
- hono
- tinyfish
- typescript
Log in or sign up for Devpost to join the conversation.