Agentic Robot Framework

frontend-2 Mission Summary
frontend-1 Mission Control
GIF
sandbox test obstacle avoidance
GIF
sandbox test sequential
GIF
sandbox test percision placement avoidance

Inspiration

Machines are dumb and can't tell us whats wrong. We must test them to find faults in logic or sensors. Manual doing these tasks are is too slow. I had an idea: what if an agentic AI could "automate" that for the the user? The goal was to build a framework where natural language commands could drive complex robotic behaviors, enabling autonomous testing and validation.

What it does

We built MERLIN (Multi-modal Embodied Robot Learning Intelligence Network), an agentic AI framework that lets an LLM become the robot's "brain." You can command it in plain English, and it executes.

Natural Language Control: "Pick up the red cube and move it to coordinates 2,2."
Computer Vision: Sees and identifies objects in real-time.
Autonomous Navigation: Plans paths and avoids obstacles.
Error Recovery: If it fails, it re-thinks the problem and tries again.
High Precision: Capable of sub-centimeter accuracy.
Mission Logging: Logs all actions for validation and testing.

It all runs on a finite state machine that translates the LLM's "thoughts" into robotic actions.

How we built it

We built MERLIN with a modular architecture in Python.

Core: A State Machine Engine acts as the nervous system.
Brain: An MCP (Model Context Protocol) server bridges the state machine to an LLM, like Anthropic's Claude or a local Ollama model.
Eyes: A Vision Pipeline handles object detection.
Body: With no hardware, we built our own 3D simulator and mock controller using Matplotlib to test everything from basic logic to full LLM-controlled missions.

Challenges we ran into

No Hardware Access: This was our biggest hurdle. It forced us to build a comprehensive 3D simulator and mock hardware controller from scratch.
LLM Integration: It took significant prompt engineering to reliably bridge natural language commands with concrete robotic actions.
Simulation Accuracy: Making a 2D plot "behave" like a 3D robot was a major challenge.
State Management: Keeping the state machine and the LLM in sync, especially during errors, was complex.

Accomplishments that we're proud of

It Works!: We proved an LLM can effectively control a complex robotic state machine.
Robust Error Recovery: Watching the robot fail, "rethink," and succeed on its own was a huge win.
Great Demos: We created a full suite of mission videos showing off its capabilities, from simple tasks to complex sorting.
Scalable Architecture: We built a modular system ready for real hardware and new features.

What we learned

LLMs + State Machines = A Powerful Combo: LLMs are great at high-level reasoning, and state machines are great at reliable execution. Together, they work incredibly well.
Simulation is Vital: A good simulation environment is non-negotiable for developing robotic systems, especially without hardware.
Modular Design is Key: Clear interfaces between components were essential for testing and integration.

What's next for Agentic Robot Framework

This is just the beginning. Our next steps are to:

Test on a hardware: This is priority #1—to validate our simulation results.
Explore Advanced AI: Move beyond state machines to systems like GOAP.
**Add pain to the network (Error feedback from statemachine to Agent)
Integrate More Frameworks: Add support for LangChain, CrewAI, and AutoGen.
Go Multi-Agent: Use our Letta framework integration to coordinate multi-robot "swarm" systems.

Built With

claude
docker
fastapi
gemini
groq
maniskill
ollama
opencv
pydantic
python
ros2

Updates

Ahmad Kaddoura started this project — Oct 26, 2025 12:17 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.