Our Journey: From Boxes to Bots

The Inspiration

Warehouse automation is massive, with hundreds of thousands of robots already deployed across industry. But nearly all of them are rigid, purpose-built systems.

We wanted to explore a different idea:
what happens when you combine modern AI agents with embodied systems?

Not just automation, but reasoning plus action.

What We Built

We built a system where natural language commands get turned into real-world execution.

A simple request like:

“Pick up package 1003”

becomes:

Task decomposition (fetch → grasp → carry → place)
Path planning with collision avoidance
Execution through learned control policies

Instead of directly controlling everything, we designed a layered system:

Agent layer decides what to do
Task manager structures the plan
Skill primitives define reusable actions
Policies/controllers execute movements

What We Learned

1. Policies Are Not Enough

Modern robotics relies heavily on learned policies, especially reinforcement learning. These are great at how to do something, like walking or grasping.

But they do not answer:

What should I do next
Which goal matters right now
How do I adapt to new instructions

That is where agents come in.

2. Agents Cannot Control Everything

At the same time, agents are not suited for low-level control.

They can reason, but:

They do not handle continuous dynamics
They cannot reliably output stable joint commands
They do not capture physical constraints

Trying to let an agent directly control execution quickly breaks down.

3. The Real Problem is the Interface

The hardest part was not planning or control, but connecting the two.

We arrived at a clear separation:

$$ \text{Agent (what)} \rightarrow \text{Skills} \rightarrow \text{Policies (how)} $$

Agents operate on intent
Policies operate on physics
Skills connect the two

Without this abstraction, the system becomes unstable and difficult to scale.

The Missing Piece: Coordination

Even with multiple systems running, true communication is minimal.

Each unit:

Plans independently
Executes its own policy
Avoids others through constraints

There is no shared reasoning or negotiation, only local decisions with global rules.

This showed us that most multi-agent systems today behave more like parallel actors than truly collaborative agents.

Challenges We Faced

Bridging language to execution
Mapping vague human input into structured actions
Policy integration
Combining learned controllers with higher-level reasoning
Coordination
Avoiding conflicts without real communication
Abstraction design
Defining the boundary between reasoning and control

Results

Natural language to structured execution pipeline
Stable integration of agents with control policies
Multi-system coordination without collisions
End-to-end system from intent to action

Final Thoughts

The key insight:

The future is not policies or agents, it is both, connected through the right abstractions.

Policies handle the complexity of physics.
Agents handle reasoning and decision-making.

The real challenge is designing the interface between them.

That is what turns intelligence into action.

Built With

as1
fetch.ai
mujoco
unitree-g1

Updates

DANIEL WU started this project — Apr 26, 2026 03:44 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.