Inspiration
Every developer has opened a dependency report and immediately closed it. Not because the information was wrong, but because it was useless. "47 outdated packages." Okay. Which ones? Why? What do I do Monday morning?
Tools like Dependabot and Renovate solved the automation problem — they can open PRs, bump versions, run checks. But they never solved the understanding problem. They treat a critical vulnerability in your core auth layer the same way they treat a package that is one patch behind and touched by nothing. The list is complete and the list is meaningless.
The inspiration for Goodman was a simple question: what would a senior engineer tell you if you asked them to spend a morning reviewing your dependencies? They wouldn't hand you a sorted list. They would read the codebase, understand what actually matters, and tell you plainly: this one is a fire, this one needs a plan, this one you can ignore. Goodman is that senior engineer, available to every project, every week, without being asked.
What it does
Goodman is a GitLab Duo custom flow that produces a prioritized, reasoned dependency health advisory, written in plain English, delivered as a GitLab issue on a schedule, specific to how your project actually uses its packages.
It works in two stages. First, a GitLab CI pipeline runs on a schedule and does the mechanical work: collecting outdated package data, running vulnerability audits, and assembling everything into a structured context file. This separates deterministic data collection from judgment.
Then the Duo flow takes over with two agents in sequence. The first agent, the analyzer, reads the collected package data and then reads the actual source files to understand usage depth. It is looking for the difference between a vulnerable package that lives in a dev utility nobody calls and the same package imported across your core business logic. That distinction is what current tools completely miss.
The second agent, the advisory writer, takes that analysis and writes the advisory. Three tiers: Act Now for active risks where the codebase is genuinely exposed, Plan for Soon for packages aging in ways that will matter, and Worth Knowing for low urgency items worth tracking. Each entry is two to three sentences. What the risk is. Why it matters for this project specifically. What to do next.
The result lands as a GitLab issue every Monday morning. No dashboard to check, no alert to dismiss, just a clear answer to a question developers have always had and never had a good tool for.
How we built it
Goodman is built entirely on GitLab's native primitives, which was a deliberate choice. The CI pipeline uses the project's package manager, npm, pip, or both, to collect outdated and audit data, merges them into a single JSON context file, and saves it as a pipeline artifact. This stage is deterministic and logged, which matters for trust. Developers can see exactly what data the agent was given.
The Duo custom flow then reads that artifact as its starting context. The two
agents are chained in sequence: the analyzer uses read_file and read_files
to traverse the codebase and build its usage depth assessment, and the advisory
writer uses create_issue to post the final output. The agents are powered by
Anthropic models running within GitLab Duo, which meant the reasoning quality
was high without any external API configuration.
The prompt design was the most important engineering decision. The analyzer prompt is explicit about what file to start with and what to look for, so it spends its entire reasoning budget on judgment rather than discovery. The writer prompt bans generic advice. Every sentence must be grounded in what the analyzer found about this specific codebase. That constraint is what makes the output feel like it was written by someone who read your code, not generated by something that read the npm documentation.
Challenges we ran into
The hardest challenge was the boundary between what the agent should do and what the CI pipeline should do. Early versions tried to have the agent discover outdated packages by reading lockfiles and inferring version staleness. This burned most of the context window on mechanical work the agent is not well-suited for, and left little room for the reasoning that actually makes the output valuable.
The insight that resolved this was treating CI as the data collection layer and the agent as the judgment layer. Once that boundary was clear, both sides became significantly better at their jobs.
The second challenge was prompt design for specificity. The natural failure mode of a tool like this is generic output, advice that could apply to any project and therefore applies to none. Getting the agents to stay grounded in what they actually found in the codebase, rather than defaulting to general best practice advice, required careful prompt constraints and testing across projects with different dependency profiles.
Accomplishments that we're proud of
The thing we are most proud of is that the output reads like it was written by a person who knows the codebase, not generated by a tool that knows about packages in general. That specificity, "this vulnerability matters because you are calling this method in your payment processing flow, not because vulnerabilities are bad in general," is genuinely hard to achieve and is what separates Goodman from everything else in this space.
We are also proud of the architectural decision to keep it entirely within GitLab's native primitives. No external services, no API keys to manage, no infrastructure to maintain. A team can adopt Goodman by adding two files to their repository.
What we learned
The most important thing we learned is that AI agents in developer tooling are most valuable when they replace judgment, not when they replace automation. Automation already exists for dependency management. What does not exist is something that can read a codebase, understand context, and tell you what actually matters. That is where the agent earns its place.
We also learned that the division of labor between deterministic pipelines and AI agents matters enormously. Agents that have to do mechanical data collection before they can do reasoning produce worse reasoning. Giving the agent clean, structured input and letting it focus entirely on analysis and writing produced dramatically better output.
What's next for Goodman
The most immediate next step is connecting Goodman to live registry data through a Google Cloud MCP server, wrapping npm's registry API, PyPI's JSON API, and the OSV vulnerability database in a Cloud Run service that the flow can query directly. This would give the analyzer real upstream maintenance signals: last commit date, open issue counts, download trend, maintainer activity. The current version infers these from version distance and vulnerability reports; live data would make the prioritization significantly more accurate.
Beyond that, the natural extension is making Goodman bidirectional. Right now it produces an advisory and stops. The next version would let a developer reply to the issue, "handle the Act Now items," and have Goodman open MRs with the actual upgrades, using its understanding of usage depth to write migration notes specific to how this project uses each package.
The longer term vision is a project health layer that goes beyond dependencies. The same reasoning approach applied to test coverage gaps, API deprecations, and infrastructure drift. Goodman started with dependencies because that is where the pain is most acute and the existing tools are most inadequate. But the underlying idea, replacing dumb lists with reasoned, specific, actionable advisories, applies across the entire lifecycle.
Built With
- anthropic
- gitlab
- gitlab-duo
- google-cloud
- javascript
- json
- npm
- osv
- pip
- python
- typescript
- yaml
Log in or sign up for Devpost to join the conversation.