Natural Language Processing for Robotics

This is the web part of the project

Inspiration

This started with my TA in robotics. They’re doing research on natural language interfaces for robots actually making robots respond to plain-English commands. Seeing that work up close was the spark: could I build a practical, demo-ready version that anybody could try in a simulator.

What it does

A tiny web app where you type things like “forward one second,” “back up,” or “small circle left,” and the robot actually moves. The LLM (Google Gemini) translates everyday language into a structured command; the server clamps it for safety, and we publish /cmd_vel in ROS 2 from inside the simulation container. Forward/back works great; “little circle” uses constant linear and angular velocity to trace an arc. Simple, reliable, and demo-friendly.

How we built it

I started with a simple promise: type plain English, watch a robot move, no drama. The web page sends a sentence to a tiny FastAPI service. Gemini turns that sentence into a clean little JSON blob {linear_x, angular_z, seconds} and then our server puts guardrails on it (speed, turn rate, time). Instead of fighting ROS networking, we jump into the running sim container with docker exec, source the ROS environment, and publish a Twist to the right /cmd_vel topic. Forward and backward were instant wins. For a small circle, we just hold constant 𝑣 and 𝜔 so the turtle traces a neat arc. It’s simple, sturdy, and demo friendly.

Challenges we ran into

ROS gave us a few classic headaches, so we fixed them one by one: inside docker exec the ros2 command wasn’t on PATH, so we always source /opt/ros/humble/setup.bash (and overlays) first; the sim container kept getting random names, so instead of chasing names we find it by its image; turtlesim and TurtleBot use different velocity topics, so we moved the topic into a .env setting we can switch in one line; the browser initially blocked our API calls (CORS), so we whitelisted http://localhost:3000; and when Gemini occasionally returned messy text, we asked for real JSON and wrote a simple parser that grabs the first {...} block. Simple fixes, reliable system.

Accomplishments that we're proud of

I accomplished the goal of natrual language processing will be it isn't super accurate due to a rushed time frame it still works. And I didn't think that would be possible due all the challeges faced.

What we learned

LLMs are great at intent; robots still deserve deterministic control. Keep the model focused on “what to do,” and let code handle “how to do it” with rates, timing, and bounds