AgentFlow: Building a Production-Ready Multi-Agent Framework

The Problem

As a developer trying to build real businesses with AI agents, I kept hitting the same wall: existing frameworks work great for demos but fall apart in production. Every time I tried to deploy something real, I faced:

  • Unpredictable behavior - agents work differently each time
  • No cost control - token usage spirals out of control
  • Security nightmares - tools can access anything
  • Impossible debugging - when things break, you can't figure out why

After months of frustration with existing solutions, I decided to build my own framework designed for production from day one.

What I'm Building

AgentFlow is a Go-based multi-agent framework that solves the production readiness problem. The key insight: production systems need deterministic behavior.

Core Innovation: Deterministic Planning

Instead of relying only on unpredictable LLM planning, AgentFlow offers:

  • FSM Planners: Finite state machines for reliable workflows
  • Behavior Trees: Structured logic with predictable outcomes
  • LLM Planners: Creative planning with fallback to deterministic plans
# Simple, predictable workflow
planner:
  type: fsm
  states:
    classify: 
      agent: classifier
      transitions:
        - condition: "$.category == 'billing'"
          target: billing_agent
        - default: human_escalation

Production-First Features

  • Sandboxed tool execution with explicit permissions
  • Cost tracking and budgets to prevent overruns
  • Complete audit trails for debugging and compliance
  • Message-driven architecture for reliability and replay

Building with Kiro

Working with Kiro has been game-changing. The specs experience is absolutely amazing - I write a high-level development plan, and Kiro converts that into detailed specifications with concrete tasks and gets all the work done.

  • Generate clean Go code - Kiro's coding capabilities are incredible, writing production-quality interfaces and implementations

Development Approach

Rather than hacking together a prototype, I took a specs-driven approach:

  1. Research phase: Analyzed why 77% of AI projects never reach production
  2. Architecture design: Control plane vs data plane separation
  3. Detailed specifications: 35+ specs with quantitative success criteria
  4. 3-quarter roadmap: Q1 MVP → Q2 Enterprise → Q3 Advanced features

This methodical approach is already paying off - every component has clear interfaces and testable contracts.

Current Status

AgentFlow is in active development following the Q1 MVP roadmap:

What's Working:

  • ✅ Project structure and CI/CD pipeline
  • ✅ Core Go interfaces and contracts
  • ⏳ NATS messaging backbone design
  • ⏳ PostgreSQL schema with audit trails
  • ⏳ FSM planner implementation (in progress)
  • ⏳ Basic tool sandboxing (in progress)

What's Next:

  • Cost tracking and budget enforcement
  • REST API and authentication
  • Worker runtime and message processing
  • Basic dashboard for monitoring

Challenges and Learning

Technical Challenges:

  • Message ordering vs performance: Solved with workflow-partitioned streams
  • Security vs usability: Auto-generating permission profiles from tool schemas
  • Cost prediction: Building heuristic models calibrated with real usage

Key Insights:

  1. Specifications save time - 2 weeks of planning prevents months of rework
  2. Production is different - reliability matters more than clever optimizations
  3. Developer experience is critical - the best framework is useless if it's hard to use

Vision

I want AgentFlow to be the Rails for AI agents - making production-ready multi-agent systems as easy to build as web applications.

The goal isn't just another framework, but a complete developer experience:

  • One-command deployment
  • Template-driven development
  • Built-in observability and debugging
  • Enterprise security by default

AgentFlow is open source and actively developed. Follow the progress: github.com/nsafouane/agentflow

Built With

Share this project:

Updates