Our Journey: From Boxes to Bots
The Inspiration
Warehouse automation is massive, with hundreds of thousands of robots already deployed across industry. But nearly all of them are rigid, purpose-built systems.
We wanted to explore a different idea:
what happens when you combine modern AI agents with embodied systems?
Not just automation, but reasoning plus action.
What We Built
We built a system where natural language commands get turned into real-world execution.
A simple request like:
“Pick up package 1003”
becomes:
- Task decomposition (fetch → grasp → carry → place)
- Path planning with collision avoidance
- Execution through learned control policies
Instead of directly controlling everything, we designed a layered system:
- Agent layer decides what to do
- Task manager structures the plan
- Skill primitives define reusable actions
- Policies/controllers execute movements
What We Learned
1. Policies Are Not Enough
Modern robotics relies heavily on learned policies, especially reinforcement learning. These are great at how to do something, like walking or grasping.
But they do not answer:
- What should I do next
- Which goal matters right now
- How do I adapt to new instructions
That is where agents come in.
2. Agents Cannot Control Everything
At the same time, agents are not suited for low-level control.
They can reason, but:
- They do not handle continuous dynamics
- They cannot reliably output stable joint commands
- They do not capture physical constraints
Trying to let an agent directly control execution quickly breaks down.
3. The Real Problem is the Interface
The hardest part was not planning or control, but connecting the two.
We arrived at a clear separation:
$$ \text{Agent (what)} \rightarrow \text{Skills} \rightarrow \text{Policies (how)} $$
- Agents operate on intent
- Policies operate on physics
- Skills connect the two
Without this abstraction, the system becomes unstable and difficult to scale.
The Missing Piece: Coordination
Even with multiple systems running, true communication is minimal.
Each unit:
- Plans independently
- Executes its own policy
- Avoids others through constraints
There is no shared reasoning or negotiation, only local decisions with global rules.
This showed us that most multi-agent systems today behave more like parallel actors than truly collaborative agents.
Challenges We Faced
Bridging language to execution
Mapping vague human input into structured actionsPolicy integration
Combining learned controllers with higher-level reasoningCoordination
Avoiding conflicts without real communicationAbstraction design
Defining the boundary between reasoning and control
Results
- Natural language to structured execution pipeline
- Stable integration of agents with control policies
- Multi-system coordination without collisions
- End-to-end system from intent to action
Final Thoughts
The key insight:
The future is not policies or agents, it is both, connected through the right abstractions.
Policies handle the complexity of physics.
Agents handle reasoning and decision-making.
The real challenge is designing the interface between them.
That is what turns intelligence into action.
Built With
- as1
- fetch.ai
- mujoco
- unitree-g1
Log in or sign up for Devpost to join the conversation.