LLMExplorer

project overview

Inspiration

Controlling robot behavior historically relies on well-defined algorithms, control loops, and functions that always give the same output given the same input. It relies on largely numerical data+computation. However, you cannot control robot behavior with text as it has 'meaning' behind it with substantial input variability and inconsistencies. The key insight to this project was the idea of using LLM agents for defining and deciding robot behaviour, allowing for language-based decision making. We thought the best way to interpret this meaning is through natural language with LLMs and then allow it to convert its 'intent' into real world actions through the use of tools we define and provide.

What it does

Our project is a voice-controlled robot using LLM agents. With only voice input, it is capable of identifying a target object, and driving towards it. Once the object has been reached, it plays a little surprise tune to celebrate!

How we built it

LLMExplorer relies on LLM agents as the 'brain', if you will, to control the actions of the robot and decision making process. This is a key innovation in the space of robotics, as LLMExplorer is able to perceive the world around it through natural language as a modal of communication.

We define tools for the core LLM agents that is trained specifically on using tool usage. Tools are user defined functions that we then pass into the LLM.

The pipeline starts with OpenAI Whisper speech-to-text, which is then passed into the LLM. The LLM then parses the information, extracting meaning and intent from the text. It is able to use the functions 'available' to it and send well defined inputs to control the arduino. The control inputs are then parsed with a parser and converted into motor actuations that control the robot movement.

Challenges we ran into

The USB camera worked fine on our laptop, but not on raspberry pi due to limited processing power. Even with functional code, we are only able to make the project work with the camera connected to our laptop. Calibrating the accuracy of the direction that the robot runs when it is searching for a specific object also required numerous trials and experimentation.

Accomplishments that we're proud of

We are proud that we built a fully functioning robot that has useless features but still has impressive features.

What we learned

This project allowed us to learn a lot about how Arduinos can be integrated with motors, buzzers, and leds to control a robot. We also learned how to implement object detection models and LLM agents to verbally pass commands to the robot and have the robot autonomously find the object.

What's next for Voice Controlled Robot

Since we were not able to integrate the RPI to the robot in the given 24 hours, the next steps are to try to integrate it with the robot to remove the need of having the robot connected to a computer for processing the entire time.