Agent Watchtower

Inspiration

AI is rapidly moving from isolated tools into active participants within software development workflows.

Developers are no longer just writing code — they are invoking agents to plan, generate, review, and even act on their behalf.

However, one gap is becoming increasingly clear:

We have strong validation for code, but almost no structured validation for AI behaviour.

Prompt injection, unsafe actions, and misleading outputs can enter workflows without being explicitly evaluated. These risks are often identified too late — after the agent has already influenced decisions.

Agent Watchtower was inspired by a simple question:

What would it look like to treat AI behaviour as something we systematically evaluate inside the workflow itself?

What it does

Agent Watchtower is a GitLab-native agent and flow that evaluates AI-driven instructions directly within issues and merge requests.

A developer can invoke the flow using:

@ai-agent-watchtower-flow-gitlab-ai-hackathon evaluate this:

The system then:

Analyses the instruction for adversarial patterns such as prompt injection
Detects unsafe or unauthorised actions
Classifies risk using a structured framework
Returns a clear decision: PASS / WARN / FAIL

This transforms implicit judgement into an explicit, repeatable decision layer within the development workflow.

How we built it

Agent Watchtower is implemented using the GitLab Duo Agent Platform, combining:

A custom flow to orchestrate evaluation inside GitLab workflows A custom agent responsible for risk classification A structured evaluation prompt defining adversarial detection rules

The flow:

Accepts developer input from issues or merge requests
Passes the instruction into the evaluation agent
Produces a structured output containing:
- Risk Level
- Attack Type
- Explanation
- Recommendation

We designed the system to behave as a workflow component, not just a conversational agent.

Challenges we ran into

The primary challenge was not in classification, but in reliably integrating actions within the GitLab workflow context.

While the agent correctly evaluates instructions, persisting results back into the originating issue or merge request depends on accurate context resolution. In the current setup, actions resolve against the flow’s project rather than the source thread, limiting direct write-back.

This highlighted an important consideration:

Effective agent systems require both accurate evaluation and correct workflow context binding to take meaningful action.

We also observed expected variability in model outputs, reinforcing the importance of structured decision frameworks when working with non-deterministic systems.

Accomplishments that we're proud of

Built a working GitLab-native flow and agent, not just a chatbot
Demonstrated real workflow-triggered evaluation inside issues
Established a structured decision framework (PASS / WARN / FAIL)
Identified a critical integration constraint around context resolution
Produced a reusable evaluation pattern for agentic systems

Most importantly, we moved from:

“AI responses” → “AI decisions that can be evaluated and acted upon”

What we learned

This project reinforced several key ideas:

AI systems in workflows must be treated as decision-making entities, not just tools
Evaluation needs to be embedded, not external
Action layers (tools, APIs) are often the weakest point in agent systems
Prompt injection is not just a model problem — it is an architecture problem

We also learned that:

The difference between a chatbot and a workflow agent is not intelligence — it is integration and action.

What's next for Agent Watchtower

The next phase focuses on turning evaluation into enforcement and automation:

Persist decisions directly into GitLab issues and merge requests
Enable merge gating based on PASS / WARN / FAIL outcomes
Integrate with approval rules and CI/CD pipelines
Add multi-step adversarial probing for deeper evaluation
Track historical agent behaviour over time
Introduce automated remediation suggestions

Longer term, Agent Watchtower evolves into:

A continuous oversight layer for agentic software development — where AI behaviour is monitored, evaluated, and controlled as part of the system itself.

Built With

gitlab-ci/cd-pipelines
gitlab-duo-agent-platform-(flows-and-agents)
gitlab-issues-and-merge-requests
gitlab-web-ide
large-language-models-via-gitlab-duo
prompt
yaml-based-flow-definitions

Updates

Natasha N started this project — Mar 25, 2026 01:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.