Our Journey: From Boxes to Bots

The Inspiration

Warehouse automation is massive, with hundreds of thousands of robots already deployed across industry. But nearly all of them are rigid, purpose-built systems.

We wanted to explore a different idea:
what happens when you combine modern AI agents with embodied systems?

Not just automation, but reasoning plus action.


What We Built

We built a system where natural language commands get turned into real-world execution.

A simple request like:

“Pick up package 1003”

becomes:

  • Task decomposition (fetch → grasp → carry → place)
  • Path planning with collision avoidance
  • Execution through learned control policies

Instead of directly controlling everything, we designed a layered system:

  • Agent layer decides what to do
  • Task manager structures the plan
  • Skill primitives define reusable actions
  • Policies/controllers execute movements

What We Learned

1. Policies Are Not Enough

Modern robotics relies heavily on learned policies, especially reinforcement learning. These are great at how to do something, like walking or grasping.

But they do not answer:

  • What should I do next
  • Which goal matters right now
  • How do I adapt to new instructions

That is where agents come in.


2. Agents Cannot Control Everything

At the same time, agents are not suited for low-level control.

They can reason, but:

  • They do not handle continuous dynamics
  • They cannot reliably output stable joint commands
  • They do not capture physical constraints

Trying to let an agent directly control execution quickly breaks down.


3. The Real Problem is the Interface

The hardest part was not planning or control, but connecting the two.

We arrived at a clear separation:

$$ \text{Agent (what)} \rightarrow \text{Skills} \rightarrow \text{Policies (how)} $$

  • Agents operate on intent
  • Policies operate on physics
  • Skills connect the two

Without this abstraction, the system becomes unstable and difficult to scale.


The Missing Piece: Coordination

Even with multiple systems running, true communication is minimal.

Each unit:

  • Plans independently
  • Executes its own policy
  • Avoids others through constraints

There is no shared reasoning or negotiation, only local decisions with global rules.

This showed us that most multi-agent systems today behave more like parallel actors than truly collaborative agents.


Challenges We Faced

  • Bridging language to execution
    Mapping vague human input into structured actions

  • Policy integration
    Combining learned controllers with higher-level reasoning

  • Coordination
    Avoiding conflicts without real communication

  • Abstraction design
    Defining the boundary between reasoning and control


Results

  • Natural language to structured execution pipeline
  • Stable integration of agents with control policies
  • Multi-system coordination without collisions
  • End-to-end system from intent to action

Final Thoughts

The key insight:

The future is not policies or agents, it is both, connected through the right abstractions.

Policies handle the complexity of physics.
Agents handle reasoning and decision-making.

The real challenge is designing the interface between them.

That is what turns intelligence into action.

Built With

  • as1
  • fetch.ai
  • mujoco
  • unitree-g1
Share this project:

Updates