Inspiration

Machines are dumb and can't tell us whats wrong. We must test them to find faults in logic or sensors. Manual doing these tasks are is too slow. I had an idea: what if an agentic AI could "automate" that for the the user? The goal was to build a framework where natural language commands could drive complex robotic behaviors, enabling autonomous testing and validation.

What it does

We built MERLIN (Multi-modal Embodied Robot Learning Intelligence Network), an agentic AI framework that lets an LLM become the robot's "brain." You can command it in plain English, and it executes.

  • Natural Language Control: "Pick up the red cube and move it to coordinates 2,2."
  • Computer Vision: Sees and identifies objects in real-time.
  • Autonomous Navigation: Plans paths and avoids obstacles.
  • Error Recovery: If it fails, it re-thinks the problem and tries again.
  • High Precision: Capable of sub-centimeter accuracy.
  • Mission Logging: Logs all actions for validation and testing.

It all runs on a finite state machine that translates the LLM's "thoughts" into robotic actions.

How we built it

We built MERLIN with a modular architecture in Python.

  • Core: A State Machine Engine acts as the nervous system.
  • Brain: An MCP (Model Context Protocol) server bridges the state machine to an LLM, like Anthropic's Claude or a local Ollama model.
  • Eyes: A Vision Pipeline handles object detection.
  • Body: With no hardware, we built our own 3D simulator and mock controller using Matplotlib to test everything from basic logic to full LLM-controlled missions.

Challenges we ran into

  • No Hardware Access: This was our biggest hurdle. It forced us to build a comprehensive 3D simulator and mock hardware controller from scratch.
  • LLM Integration: It took significant prompt engineering to reliably bridge natural language commands with concrete robotic actions.
  • Simulation Accuracy: Making a 2D plot "behave" like a 3D robot was a major challenge.
  • State Management: Keeping the state machine and the LLM in sync, especially during errors, was complex.

Accomplishments that we're proud of

  • It Works!: We proved an LLM can effectively control a complex robotic state machine.
  • Robust Error Recovery: Watching the robot fail, "rethink," and succeed on its own was a huge win.
  • Great Demos: We created a full suite of mission videos showing off its capabilities, from simple tasks to complex sorting.
  • Scalable Architecture: We built a modular system ready for real hardware and new features.

What we learned

  • LLMs + State Machines = A Powerful Combo: LLMs are great at high-level reasoning, and state machines are great at reliable execution. Together, they work incredibly well.
  • Simulation is Vital: A good simulation environment is non-negotiable for developing robotic systems, especially without hardware.
  • Modular Design is Key: Clear interfaces between components were essential for testing and integration.

What's next for Agentic Robot Framework

This is just the beginning. Our next steps are to:

  • Test on a hardware: This is priority #1—to validate our simulation results.
  • Explore Advanced AI: Move beyond state machines to systems like GOAP.
  • **Add pain to the network (Error feedback from statemachine to Agent)
  • Integrate More Frameworks: Add support for LangChain, CrewAI, and AutoGen.
  • Go Multi-Agent: Use our Letta framework integration to coordinate multi-robot "swarm" systems.

Built With

Share this project:

Updates