tracepath

Screenshot of the agent running
Screenshot of an issue opened on a repository
List of the tools created available to the agent
Screenshot of the scan summary by the agent

Inspiration

Every engineering team we know has a CVE dashboard that nobody reads. Scanners are loud, triage is manual, and the fix usually turns out to be "bump the base image," which nobody has time to verify. We wanted the scanner to stop producing work and start closing it.

The "Ship to Prod" framing was the nudge: build an agent that takes the action a human keeps postponing, not one that files another ticket about it.

What it does

tracepath is the tool backend behind a Guild.ai agent that autonomously triages and remediates vulnerabilities in a GitHub repo or container image.

Given a target, the agent:

Runs osv-scanner over the repo and the image.
Cross-references every image finding against the Chainguard/Wolfi security.json feed to isolate the subset already fixed upstream, i.e. one base-image swap away from gone.
Uses repo navigation tools (file, files, list, grep) to locate the vulnerable call sites and decide whether the CVE is actually reachable in this codebase.
Dispatches TinyFish fetchers and browser agents against advisory URLs to pull exploit conditions the JSON feed does not carry.
Opens a GitHub issue with a remediation plan and the evidence behind it.

The output is a patch proposal with provenance, not another row in a dashboard.

How we built it

Runtime: Hono on Node, TypeScript, valibot schemas exposed through hono-openapi so Guild.ai gets a typed, self-describing tool surface via /openapi and Scalar docs at /docs.
Scanners: osv-scanner spawned as a child process in two modes, scan source for repos and scan image for containers, with defensive JSON parsing.
Chainguard correlation: we fetch packages.wolfi.dev/os/security.json once, index it by normalized package name, and for each OSV finding we union group.ids, group.aliases, and vuln.aliases to match against Wolfi secfixes. A finding is flagged fixable iff any alias maps to a Wolfi-fixed version.
Repo tools: a small, LLM-shaped filesystem. Reads are windowed to 1000 lines with offsets, directory listings are flat and one level deep, and grep caps at 500 matches with a truncation flag. Every tool output is bounded so the agent never blows its context.
Browser agents: TinyFish SDK, wrapped so APIStatusError and APIError surface as typed HTTP responses instead of leaking provider internals.

Challenges we ran into

osv-scanner JSON hygiene. Non-JSON preamble on stdout and non-zero exit codes on successful scans forced us to parse from the first { and treat exit codes as advisory.
Package identity. A single image package shows up under an ecosystem name and an OS package name; missing either side of the lookup silently drops fixes. We normalize both and dedupe on (package, fixedVersion).
Bounding LLM context. The first Guild.ai runs happily asked for 40k-line lockfiles. Every tool grew a cap.
Error surface. Each upstream (osv, Wolfi, TinyFish, filesystem) has its own failure mode; we funneled them into typed error classes so the agent receives an actionable 4xx/5xx instead of a stack trace.

Accomplishments that we're proud of

An agent loop that goes from "here is a repo" to "here is an issue with a fix" without a human in the middle.
A Chainguard correlation that turns a wall of red findings into the small, actually actionable subset, on real production images.
A tool surface designed for an LLM rather than a human: bounded outputs, stable shapes, typed errors, and an OpenAPI document the agent can introspect.
Three sponsor integrations (Guild.ai, Chainguard, TinyFish) that each pull their weight in the loop, no glue tools.

What we learned

Tools for agents are not tools for humans. Every endpoint needed a truncation story and a stable shape. grep without a cap crashes the agent; file without an offset hides bugs past line 1000.
"Vulnerable" and "fixed" live in different vocabularies. OSV speaks in CVE/GHSA aliases; Wolfi's secfixes sometimes use one, sometimes another, sometimes both. The alias union is the whole trick.
The interesting subset is small. On real images, the fraction of findings that are both Chainguard-fixable and reachable in code is tiny, and that is exactly the subset a human should have been triaging all along.

What's next for tracepath

Close the loop end-to-end. Move from "open an issue" to "open a PR" with the Dockerfile diff and a green build attached.
Reachability beyond grep. Plug in a lightweight call-graph step (tree-sitter or LSP) so "is this CVE actually exercised" stops relying on string search.
Continuous mode. Run on a schedule against a fleet of repos and images, only paging humans when a new finding is reachable and not Chainguard-fixable.
More remediation sources. Wolfi is the cleanest feed today; adding distroless and the major distro security trackers widens the "already fixed upstream" net.
Cost and provenance receipts. Every issue tracepath opens should carry a signed manifest of which tools ran, what they returned, and what the agent decided, so reviewers can audit the loop instead of trusting it.

Built With

chainguard
guild.ai
hono
tinyfish
typescript

Updates

Clément Boillot started this project — Apr 24, 2026 07:29 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.