Agent Forge: The Agent That Builds Agents

Inspiration

Agent Forge was born out of frustration.

Initially, I set out to build a single agent on the GitLab Duo Agent Platform — something simple like a security reviewer or documentation enforcer. The idea was straightforward, but the execution wasn’t. As someone new to the platform, I quickly ran into friction: unclear tool selection, multiple YAML schemas (agents vs flows), silent failures, and cryptic CI errors.

This wasn’t an isolated experience. Across the hackathon community, developers were struggling with the same issues — invalid tool names, broken flows, sessions terminating without explanation, and pipelines failing without actionable feedback. :contentReference[oaicite:0]{index=0}

At the same time, I came across research on self-evolving agents, particularly LIVE-SWE-AGENT, which demonstrated that systems capable of reflecting and improving their own outputs outperform static approaches.

That led to a key insight:

If building one agent is hard, why not build an agent that builds all agents?

What I Learned

This project was a deep dive into:

1. Agent Design is a Reasoning Problem

Building agents is not just about filling templates. It requires:

Understanding intent
Selecting the right tools
Designing execution flows
Handling edge cases

This is inherently a reasoning task, not a static configuration problem.

2. Tool Selection is the Hidden Bottleneck

Given a toolset of 60+ options, selecting the correct tools is non-trivial.

You can think of this as an optimization problem:

$$ \text{Optimal Agent} = \arg\min_{A} \left( \text{Error}(A) + \lambda \cdot \text{Complexity}(A) \right) $$

Where:

Error = mismatch between intended and actual behavior
Complexity = unnecessary tools or steps

Agent Forge explicitly reasons about this tradeoff.

3. Self-Reflection is a Force Multiplier

Inspired by LIVE-SWE-AGENT, I implemented a reflection loop:

$$ \text{Output}_{t+1} = f(\text{Output}_t, \text{Critique}(\text{Output}_t)) $$

Instead of generating once, the system:

Critiques its own output
Detects errors (invalid tools, weak prompts)
Iteratively improves

This dramatically increases reliability.

How I Built It

Agent Forge is implemented as a multi-component flow on the GitLab Duo Agent Platform.

Core Components

1. Issue Analyzer

Parses the GitLab issue
Extracts structured intent
Classifies request (Agent vs Flow)

2. Agent Designer

Maps requirements to tools
Generates YAML configuration
Writes system prompts

3. Reflection Reviewer (Key Innovation)

Validates tool names
Checks prompt quality
Ensures completeness
Revises output if needed

4. YAML Validator

Enforces schema correctness
Prevents CI failures

5. Committer

Commits generated YAML
Creates versioned tags

6. Reporter

Posts results back to the issue
Provides usage instructions

End-to-End Flow

Developer creates a GitLab issue describing a workflow
Mentions Agent Forge
Agent Forge:
- Parses intent
- Designs agent
- Generates YAML
- Self-reviews and improves
- Commits and publishes
Developer enables and uses the agent

Challenges I Faced

1. Ambiguity in Tool Selection

Many tools have similar names but different behaviors. Using the wrong one leads to silent failures.

Solution:
Added a verification step using documentation search and reflection.

2. YAML Schema Complexity

Agents and flows use completely different schemas.

Solution:
Built schema-aware generation and validation layers.

3. Silent Failures in Execution

Flows could fail without clear error messages (e.g., session termination).

Solution:
Designed defensive validation and reflection to catch issues before execution.

4. Prompt Quality

Generic prompts produce weak agents.

Solution:
Generate domain-specific, tool-aware system prompts and refine them during reflection.

5. Ensuring Reliability

The biggest challenge was moving from:

“Looks correct”
to
“Actually works”

Solution:
Introduce a structured self-reflection loop that simulates real usage.

Why This Matters

Agent Forge changes the paradigm:

Before:

Developers build agents manually
High friction
Low adoption

After:

Developers describe problems in natural language
Agents are generated automatically
The ecosystem grows recursively

This creates a self-bootstrapping system:

$$ \text{Platform Value} \propto \text{Number of Agents Generated} $$

And Agent Forge directly increases that number.

Conclusion

Agent Forge is not just a tool — it’s a shift in how we think about building AI systems.

Instead of:

Engineers building agents for users

We move to:

Users describing problems, and AI building agents for them

It transforms GitLab into a self-evolving AI platform, where the system continuously expands its own capabilities.

Agent Forge — The agent that builds the agents.

Built by Mrigesh Thakur · GitLab AI Hackathon 2025 · Powered by Anthropic Claude via GitLab Duo Agent Platform

Built With

claude
gitlab-duo

Updates

Mrigesh Thakur started this project — Mar 25, 2026 01:12 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.