Agent Forge: The Agent That Builds Agents
Inspiration
Agent Forge was born out of frustration.
Initially, I set out to build a single agent on the GitLab Duo Agent Platform — something simple like a security reviewer or documentation enforcer. The idea was straightforward, but the execution wasn’t. As someone new to the platform, I quickly ran into friction: unclear tool selection, multiple YAML schemas (agents vs flows), silent failures, and cryptic CI errors.
This wasn’t an isolated experience. Across the hackathon community, developers were struggling with the same issues — invalid tool names, broken flows, sessions terminating without explanation, and pipelines failing without actionable feedback. :contentReference[oaicite:0]{index=0}
At the same time, I came across research on self-evolving agents, particularly LIVE-SWE-AGENT, which demonstrated that systems capable of reflecting and improving their own outputs outperform static approaches.
That led to a key insight:
If building one agent is hard, why not build an agent that builds all agents?
What I Learned
This project was a deep dive into:
1. Agent Design is a Reasoning Problem
Building agents is not just about filling templates. It requires:
- Understanding intent
- Selecting the right tools
- Designing execution flows
- Handling edge cases
This is inherently a reasoning task, not a static configuration problem.
2. Tool Selection is the Hidden Bottleneck
Given a toolset of 60+ options, selecting the correct tools is non-trivial.
You can think of this as an optimization problem:
$$ \text{Optimal Agent} = \arg\min_{A} \left( \text{Error}(A) + \lambda \cdot \text{Complexity}(A) \right) $$
Where:
- Error = mismatch between intended and actual behavior
- Complexity = unnecessary tools or steps
Agent Forge explicitly reasons about this tradeoff.
3. Self-Reflection is a Force Multiplier
Inspired by LIVE-SWE-AGENT, I implemented a reflection loop:
$$ \text{Output}_{t+1} = f(\text{Output}_t, \text{Critique}(\text{Output}_t)) $$
Instead of generating once, the system:
- Critiques its own output
- Detects errors (invalid tools, weak prompts)
- Iteratively improves
This dramatically increases reliability.
How I Built It
Agent Forge is implemented as a multi-component flow on the GitLab Duo Agent Platform.
Core Components
1. Issue Analyzer
- Parses the GitLab issue
- Extracts structured intent
- Classifies request (Agent vs Flow)
2. Agent Designer
- Maps requirements to tools
- Generates YAML configuration
- Writes system prompts
3. Reflection Reviewer (Key Innovation)
- Validates tool names
- Checks prompt quality
- Ensures completeness
- Revises output if needed
4. YAML Validator
- Enforces schema correctness
- Prevents CI failures
5. Committer
- Commits generated YAML
- Creates versioned tags
6. Reporter
- Posts results back to the issue
- Provides usage instructions
End-to-End Flow
- Developer creates a GitLab issue describing a workflow
- Mentions Agent Forge
- Agent Forge:
- Parses intent
- Designs agent
- Generates YAML
- Self-reviews and improves
- Commits and publishes
- Parses intent
- Developer enables and uses the agent
Challenges I Faced
1. Ambiguity in Tool Selection
Many tools have similar names but different behaviors. Using the wrong one leads to silent failures.
Solution:
Added a verification step using documentation search and reflection.
2. YAML Schema Complexity
Agents and flows use completely different schemas.
Solution:
Built schema-aware generation and validation layers.
3. Silent Failures in Execution
Flows could fail without clear error messages (e.g., session termination).
Solution:
Designed defensive validation and reflection to catch issues before execution.
4. Prompt Quality
Generic prompts produce weak agents.
Solution:
Generate domain-specific, tool-aware system prompts and refine them during reflection.
5. Ensuring Reliability
The biggest challenge was moving from:
- “Looks correct”
to - “Actually works”
Solution:
Introduce a structured self-reflection loop that simulates real usage.
Why This Matters
Agent Forge changes the paradigm:
Before:
- Developers build agents manually
- High friction
- Low adoption
After:
- Developers describe problems in natural language
- Agents are generated automatically
- The ecosystem grows recursively
This creates a self-bootstrapping system:
$$ \text{Platform Value} \propto \text{Number of Agents Generated} $$
And Agent Forge directly increases that number.
Conclusion
Agent Forge is not just a tool — it’s a shift in how we think about building AI systems.
Instead of:
Engineers building agents for users
We move to:
Users describing problems, and AI building agents for them
It transforms GitLab into a self-evolving AI platform, where the system continuously expands its own capabilities.
Agent Forge — The agent that builds the agents.
Built by Mrigesh Thakur · GitLab AI Hackathon 2025 · Powered by Anthropic Claude via GitLab Duo Agent Platform
Built With
- claude
- gitlab-duo
Log in or sign up for Devpost to join the conversation.