Inspiration
Machines are dumb and can't tell us whats wrong. We must test them to find faults in logic or sensors. Manual doing these tasks are is too slow. I had an idea: what if an agentic AI could "automate" that for the the user? The goal was to build a framework where natural language commands could drive complex robotic behaviors, enabling autonomous testing and validation.
What it does
We built MERLIN (Multi-modal Embodied Robot Learning Intelligence Network), an agentic AI framework that lets an LLM become the robot's "brain." You can command it in plain English, and it executes.
- Natural Language Control: "Pick up the red cube and move it to coordinates 2,2."
- Computer Vision: Sees and identifies objects in real-time.
- Autonomous Navigation: Plans paths and avoids obstacles.
- Error Recovery: If it fails, it re-thinks the problem and tries again.
- High Precision: Capable of sub-centimeter accuracy.
- Mission Logging: Logs all actions for validation and testing.
It all runs on a finite state machine that translates the LLM's "thoughts" into robotic actions.
How we built it
We built MERLIN with a modular architecture in Python.
- Core: A State Machine Engine acts as the nervous system.
- Brain: An MCP (Model Context Protocol) server bridges the state machine to an LLM, like Anthropic's Claude or a local Ollama model.
- Eyes: A Vision Pipeline handles object detection.
- Body: With no hardware, we built our own 3D simulator and mock controller using Matplotlib to test everything from basic logic to full LLM-controlled missions.
Challenges we ran into
- No Hardware Access: This was our biggest hurdle. It forced us to build a comprehensive 3D simulator and mock hardware controller from scratch.
- LLM Integration: It took significant prompt engineering to reliably bridge natural language commands with concrete robotic actions.
- Simulation Accuracy: Making a 2D plot "behave" like a 3D robot was a major challenge.
- State Management: Keeping the state machine and the LLM in sync, especially during errors, was complex.
Accomplishments that we're proud of
- It Works!: We proved an LLM can effectively control a complex robotic state machine.
- Robust Error Recovery: Watching the robot fail, "rethink," and succeed on its own was a huge win.
- Great Demos: We created a full suite of mission videos showing off its capabilities, from simple tasks to complex sorting.
- Scalable Architecture: We built a modular system ready for real hardware and new features.
What we learned
- LLMs + State Machines = A Powerful Combo: LLMs are great at high-level reasoning, and state machines are great at reliable execution. Together, they work incredibly well.
- Simulation is Vital: A good simulation environment is non-negotiable for developing robotic systems, especially without hardware.
- Modular Design is Key: Clear interfaces between components were essential for testing and integration.
What's next for Agentic Robot Framework
This is just the beginning. Our next steps are to:
- Test on a hardware: This is priority #1—to validate our simulation results.
- Explore Advanced AI: Move beyond state machines to systems like GOAP.
- **Add pain to the network (Error feedback from statemachine to Agent)
- Integrate More Frameworks: Add support for LangChain, CrewAI, and AutoGen.
- Go Multi-Agent: Use our Letta framework integration to coordinate multi-robot "swarm" systems.

Log in or sign up for Devpost to join the conversation.