GitFlow

Inspiration

"Why build a flow when you can build a flow builder." - GitLab CEO maybe?

What it does

GitFlow is a visual editor.

As functionality increases, flows can become more and more complex. At this point, editing YAML files becomes unbearable. Sharing flows becomes difficult, variable names cause confusion, and understanding which component connects to which under certain conditions takes time away from what actually matters: automation.

With GitFlow painstakingly retyping components becomes drag and drop.

Weighted Composite Score

GitFlow + GitLab

GitFlow gives users the opportunity to create flows in plain english straight from their IDE with the GitFlow Yaml Agent. An Agent made specifically for GitFlow utilising "Claude Sonnet 4.5" as the model of choice. (See Model Selection).

Currently under Open Access in the GitLab AI Catalogue.

GitFlow is built for any and all who use GitLab and want their automations to flow.

Access the web version here: GitFlow

Access the GitHub Repo here: Repository

Usage

Flows can be imported and exported as yaml text. Import your existing flows or create a new one using the GitFlow Agent.

The GitFlow MCP server allows Agents to directly push yaml text directly to GitFlow for instant editing. (local only)

Model Selection

As part of this project, and in-line with the chosen category, determining the right model for the GitLab Agent was a necessity. Different models with adjusted system prompts yielded statistically varying results. In response, a methodology was created to determine the best model to use for the GitLab Duo Agent.

The methodology follows:

Define the capability profile for the task, e.g. Error Correction, Long Context Reasoning, etc.
Map capability profiles heuristically to open benchmarks to get model results.
Apply statistical analysis to these results to achieve a grounded comparison.

Why not use pure benchmark stats?

Inspired by a paper by Anthropic, statistical methods where used to approach evaluating models due to sampling noise in the questions evals amongst other factors.

Relative Information can be found here: Model Recommendation

Weighted Composite Score

Surprisingly, Sonnet 4.6 ranked last, its strengths are agentic/computer-use tasks, which aren't related to the capability profile.

From the analysis, Claude Sonnet 4.5 and Opus 4.6 where statistically indistinguishable on weighted scoring of the GPQA Diamond bench, leaving the choice down to cost, in which Sonnet 4.5 achieves, being over 60% cheaper to use.

Contributions

Thank you to the Devpost hackathon team for providing a platform to build cool stuff.
Thank you to GitLab and co, I had an amazing experience with GitLab Duo, Claude and Google Cloud.