gpt-oss 20b powered agentic 6-DoF robot

Inspiration

The inspiration for this project came from the idea of a beach-cleaning robot.
The robot’s main task is to move across the beach and collect litter.
Since it can also communicate, it doubles as a helpful advisor, providing answers to common questions such as the weather.

What it does

Face Detection and Interaction

The robot remains in a waiting position until it recognizes a human face.
Once detected, it greets the person by saying:
“Hello, my human friend.”
You can then speak to the robot and ask questions.
Questions are answered by the GPT-OSS 20B model, which is:
- Configured with agentic capabilities
- Equipped with tools such as Wikipedia and DuckDuckGo search
The response is delivered through the robot’s text-to-speech system.

Scene Understanding and Cleanup

The GPT-OSS model includes a specialized tool to detect if the surroundings are messy.
When called:
1. A large vision model (VLM) interprets the scene.
2. It returns an assessment of cleanliness.
3. GPT-OSS evaluates the result.
4. If the environment is messy, the robot initiates cleanup.

Object Detection and Manipulation

YOLOv5 is used to localize trash items (e.g., a toothbrush).
The robot calculates their relative positions.
These coordinates are sent via TCP socket to the myCobot280 6-DOF robotic arm.
Using:
- NVIDIA Jetson platform
- ROS2 MoveIt inverse kinematics solver
The robot computes a valid trajectory to guide the gripper toward the target.
Once the target is reached, the end effector grasps the object and removes it, effectively cleaning the environment.

How I built it

I started building the robot’s TCP/IP interface and connected it to the inverse kinematics solver of ROS2. Then I connected Mediapipe to the robot to enable detection of human faces nearby and added the speech recognition and text-to-speech output capabilities. Then the robot learned to detect objects using YOLOv5 and interpret scenes with the multimodal VLM. In the end, I added the powerful agentic capabilities of gpt-oss and the tools the model has access to.

Challenges I ran into

One of the biggest challenges I faced was fine-tuning the inverse kinematics solver.
The difficulty lies in ensuring that the grabber can move correctly in every possible pose within 3D space.
Accurate camera-to-end-effector calibration is essential here.
It has been both challenging and fun to experiment with different solutions for the wide variety of scenarios.

Accomplishments that I am proud of

I am proud of what I’ve learned about inverse kinematics and how to control a robotic arm in three-dimensional space.

What I learned

Through this project, I learned that gpt-oss offers strong agentic tool-calling capabilities while still being able to run on relatively old hardware, like an RTX 3090, alongside other models required for robotic movement.

What's next for gpt-oss 20b powered agentic 6-DoF robot

If I win the robotics challenge, I would dedicate my time to implementing a larger Behavior Model on a more powerful GPU, enabling more advanced and complex robotic movements.

Built With

gpt-oss-20b
jetson
mediapipe
nvidia
ollama
opencv
python
ros2
shell
torch
yolov5

Updates

Emanuel Bierschneider started this project — Sep 07, 2025 01:02 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.