Agent Forge: The Agent That Builds Agents

Inspiration

Agent Forge was born out of frustration.

Initially, I set out to build a single agent on the GitLab Duo Agent Platform — something simple like a security reviewer or documentation enforcer. The idea was straightforward, but the execution wasn’t. As someone new to the platform, I quickly ran into friction: unclear tool selection, multiple YAML schemas (agents vs flows), silent failures, and cryptic CI errors.

This wasn’t an isolated experience. Across the hackathon community, developers were struggling with the same issues — invalid tool names, broken flows, sessions terminating without explanation, and pipelines failing without actionable feedback. :contentReference[oaicite:0]{index=0}

At the same time, I came across research on self-evolving agents, particularly LIVE-SWE-AGENT, which demonstrated that systems capable of reflecting and improving their own outputs outperform static approaches.

That led to a key insight:

If building one agent is hard, why not build an agent that builds all agents?


What I Learned

This project was a deep dive into:

1. Agent Design is a Reasoning Problem

Building agents is not just about filling templates. It requires:

  • Understanding intent
  • Selecting the right tools
  • Designing execution flows
  • Handling edge cases

This is inherently a reasoning task, not a static configuration problem.


2. Tool Selection is the Hidden Bottleneck

Given a toolset of 60+ options, selecting the correct tools is non-trivial.

You can think of this as an optimization problem:

$$ \text{Optimal Agent} = \arg\min_{A} \left( \text{Error}(A) + \lambda \cdot \text{Complexity}(A) \right) $$

Where:

  • Error = mismatch between intended and actual behavior
  • Complexity = unnecessary tools or steps

Agent Forge explicitly reasons about this tradeoff.


3. Self-Reflection is a Force Multiplier

Inspired by LIVE-SWE-AGENT, I implemented a reflection loop:

$$ \text{Output}_{t+1} = f(\text{Output}_t, \text{Critique}(\text{Output}_t)) $$

Instead of generating once, the system:

  • Critiques its own output
  • Detects errors (invalid tools, weak prompts)
  • Iteratively improves

This dramatically increases reliability.


How I Built It

Agent Forge is implemented as a multi-component flow on the GitLab Duo Agent Platform.

Core Components

1. Issue Analyzer

  • Parses the GitLab issue
  • Extracts structured intent
  • Classifies request (Agent vs Flow)

2. Agent Designer

  • Maps requirements to tools
  • Generates YAML configuration
  • Writes system prompts

3. Reflection Reviewer (Key Innovation)

  • Validates tool names
  • Checks prompt quality
  • Ensures completeness
  • Revises output if needed

4. YAML Validator

  • Enforces schema correctness
  • Prevents CI failures

5. Committer

  • Commits generated YAML
  • Creates versioned tags

6. Reporter

  • Posts results back to the issue
  • Provides usage instructions

End-to-End Flow

  1. Developer creates a GitLab issue describing a workflow
  2. Mentions Agent Forge
  3. Agent Forge:
    • Parses intent
    • Designs agent
    • Generates YAML
    • Self-reviews and improves
    • Commits and publishes
  4. Developer enables and uses the agent

Challenges I Faced

1. Ambiguity in Tool Selection

Many tools have similar names but different behaviors. Using the wrong one leads to silent failures.

Solution:
Added a verification step using documentation search and reflection.


2. YAML Schema Complexity

Agents and flows use completely different schemas.

Solution:
Built schema-aware generation and validation layers.


3. Silent Failures in Execution

Flows could fail without clear error messages (e.g., session termination).

Solution:
Designed defensive validation and reflection to catch issues before execution.


4. Prompt Quality

Generic prompts produce weak agents.

Solution:
Generate domain-specific, tool-aware system prompts and refine them during reflection.


5. Ensuring Reliability

The biggest challenge was moving from:

  • “Looks correct”
    to
  • “Actually works”

Solution:
Introduce a structured self-reflection loop that simulates real usage.


Why This Matters

Agent Forge changes the paradigm:

Before:

  • Developers build agents manually
  • High friction
  • Low adoption

After:

  • Developers describe problems in natural language
  • Agents are generated automatically
  • The ecosystem grows recursively

This creates a self-bootstrapping system:

$$ \text{Platform Value} \propto \text{Number of Agents Generated} $$

And Agent Forge directly increases that number.


Conclusion

Agent Forge is not just a tool — it’s a shift in how we think about building AI systems.

Instead of:

Engineers building agents for users

We move to:

Users describing problems, and AI building agents for them

It transforms GitLab into a self-evolving AI platform, where the system continuously expands its own capabilities.

Agent ForgeThe agent that builds the agents.

Built by Mrigesh Thakur · GitLab AI Hackathon 2025 · Powered by Anthropic Claude via GitLab Duo Agent Platform

Built With

  • claude
  • gitlab-duo
Share this project:

Updates