RoboWeave

controlling a robot (embodied agent) through LLM prompts
multiple robots controlled in sim

Inspiration

Robots can’t talk to each other. Most robots use their own command format, so researchers and companies spend hours building brittle bridges before they can even prototype a task. We wanted one pipeline that starts with a multimodal prompt (text, images, or raw sensor frames) runs that through Gemini’s Live API, and comes out the other side as a single motion plan every robot can understand. The result is a shared message schema in JSON, a MuJoCo-validated planner, and Weave as the observability backbone. For engineers, it is helpful when planning tasks to get full depth analysis on the output and process of embodied agents.

What it does

Prompt / Multi-Modal Input – RoboWeave accepts text, live webcam frames, depth maps, IMU streams (whatever the robot “sees.”)
LLM reasoning – Gemini Live API ingests that multimodal bundle and returns a compact JSON for Weave and planning.
Motion planning – The system extracts the decided function and passes it on to MuJoCo, which knows how to deal with simple instructions like forward() or rotate() or even flip() with set parameters.
Weave Integration - Full observability on every action and task of the embodied agent.

How we built it

Started with a unstructured command. Gemini is the brains and every high level decision relies on its reasoning.
Forced Gemini 1.5 Pro Live API to output only valid JSON that both our program, and Weave can parse. So, no post-processing.
Used MuJoCo as the last check before real robots. We worked only within sim since we didn't have a robot to test RoboWeave on.
Built a React Flow interface that shows prompts, nodes, and motions end-to-end.
Connected up Weave for traceability so every step (prompt, LLM call, plan, function to MuJoCo) shows up in one timeline.

Challenges we ran into

Gemini sometimes drifted from the schema or was cut off, making Weave confused. Inline validation and prompt tweaks fixed that.

Accomplishments we’re proud of

Google Gemini's Live API controlled a simulated GO2 in MuJoCo.
End-to-end latency sits at 1.9 s using just the free Gemini tier.
Entire repo is compact and optimized.

What we learned

Full-path logs through Weave are super useful since many “AI” bugs are really visibility bugs.
Make your MCP tools as transparent as possible for better results when working with agents.

What’s next for RoboWeave

Replace MuJoCo with live Unitree GO2s on ROS 2.
Feed RGB-D video through the same loop for instant obstacle re-planning.
Open-source the Gemini spec plus a client so other builders can use our system.
Stress-test a multi-robot fleet to perform a complicated task

One unstructured prompt → one JSON plan → many robots, no in-between scripts. That’s the goal.

Built With

bezier-spline-motion-library
docker
eslint-+-prettier
github-actions
google-gemini-1.5-pro-live-api
json-taskgraph-schema
model-context-protocol-adapter
mujoco-3d-engine
pyside6
python-3.10
react-18
react-flow
tanstack-router
typescript-4
unitree-go2-urdf
unocss
vite
weights-&-biases-weave
zustand

Updates

Daniel Siegel started this project — Jul 13, 2025 04:28 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.