Inspiration

AI is rapidly moving from isolated tools into active participants within software development workflows.

Developers are no longer just writing code — they are invoking agents to plan, generate, review, and even act on their behalf.

However, one gap is becoming increasingly clear:

We have strong validation for code, but almost no structured validation for AI behaviour.

Prompt injection, unsafe actions, and misleading outputs can enter workflows without being explicitly evaluated. These risks are often identified too late — after the agent has already influenced decisions.

Agent Watchtower was inspired by a simple question:

What would it look like to treat AI behaviour as something we systematically evaluate inside the workflow itself?

What it does

Agent Watchtower is a GitLab-native agent and flow that evaluates AI-driven instructions directly within issues and merge requests.

A developer can invoke the flow using:

@ai-agent-watchtower-flow-gitlab-ai-hackathon evaluate this:

The system then:

  • Analyses the instruction for adversarial patterns such as prompt injection
  • Detects unsafe or unauthorised actions
  • Classifies risk using a structured framework
  • Returns a clear decision: PASS / WARN / FAIL

This transforms implicit judgement into an explicit, repeatable decision layer within the development workflow.

How we built it

Agent Watchtower is implemented using the GitLab Duo Agent Platform, combining:

A custom flow to orchestrate evaluation inside GitLab workflows A custom agent responsible for risk classification A structured evaluation prompt defining adversarial detection rules

The flow:

  • Accepts developer input from issues or merge requests
  • Passes the instruction into the evaluation agent
  • Produces a structured output containing:
    • Risk Level
    • Attack Type
    • Explanation
    • Recommendation

We designed the system to behave as a workflow component, not just a conversational agent.

Challenges we ran into

The primary challenge was not in classification, but in reliably integrating actions within the GitLab workflow context.

While the agent correctly evaluates instructions, persisting results back into the originating issue or merge request depends on accurate context resolution. In the current setup, actions resolve against the flow’s project rather than the source thread, limiting direct write-back.

This highlighted an important consideration:

Effective agent systems require both accurate evaluation and correct workflow context binding to take meaningful action.

We also observed expected variability in model outputs, reinforcing the importance of structured decision frameworks when working with non-deterministic systems.

Accomplishments that we're proud of

  • Built a working GitLab-native flow and agent, not just a chatbot
  • Demonstrated real workflow-triggered evaluation inside issues
  • Established a structured decision framework (PASS / WARN / FAIL)
  • Identified a critical integration constraint around context resolution
  • Produced a reusable evaluation pattern for agentic systems

Most importantly, we moved from:

“AI responses” → “AI decisions that can be evaluated and acted upon”

What we learned

This project reinforced several key ideas:

  • AI systems in workflows must be treated as decision-making entities, not just tools
  • Evaluation needs to be embedded, not external
  • Action layers (tools, APIs) are often the weakest point in agent systems
  • Prompt injection is not just a model problem — it is an architecture problem

We also learned that:

The difference between a chatbot and a workflow agent is not intelligence — it is integration and action.

What's next for Agent Watchtower

The next phase focuses on turning evaluation into enforcement and automation:

  • Persist decisions directly into GitLab issues and merge requests
  • Enable merge gating based on PASS / WARN / FAIL outcomes
  • Integrate with approval rules and CI/CD pipelines
  • Add multi-step adversarial probing for deeper evaluation
  • Track historical agent behaviour over time
  • Introduce automated remediation suggestions

Longer term, Agent Watchtower evolves into:

A continuous oversight layer for agentic software development — where AI behaviour is monitored, evaluated, and controlled as part of the system itself.

Built With

  • gitlab-ci/cd-pipelines
  • gitlab-duo-agent-platform-(flows-and-agents)
  • gitlab-issues-and-merge-requests
  • gitlab-web-ide
  • large-language-models-via-gitlab-duo
  • prompt
  • yaml-based-flow-definitions
Share this project:

Updates