Inspiration

One of us got a robotic arm for a class project a few weeks ago and ran into the wall everyone hits with robotics: if you want to change what it does, you write Python and re-flash it over and over, making tiny tweaks until it does roughly what you want. Fine for us. Not fine for anyone else. Reasoning models are cheap enough now to run on every prompt, and people already talk to their phones, so why does "I want my robot to wave when I say hi" still take a tutorial and an afternoon? We also kept thinking about Rocky from Project Hail Mary, the alien who became a real friend with nothing but patient back-and-forth. None of our robots are aliens, but most of them might as well be.

What it does

You say or type what you want ("wave when you hear hello," "look sad when I type 😢 ") and ReWire turns it into a routine that actually runs on the hardware. The reasoning model writes the plan from the robot's real capabilities, walks you through it before anything moves, and stays in the loop so you can keep refining it ("also tilt up at the end so you look proud") instead of starting over. Voice, keys, and sensor events all share the same trigger system, so adding a new behavior is a sentence, not an afternoon of wiring.

How we built it

The companion is a FastAPI service that handles planning, validation, and execution. A Pydantic manifest describes everything the robot can do (skills, joint ranges, argument bounds), and the planner can only compose from what's in there. K2-Think does most of the reasoning with Claude as a backup, and we thread prior turns into the system prompt so refinement reads like a real conversation. Inputs come through a small adapter layer: the browser's SpeechRecognition for voice, pynput for keys, plus a mock for dev. Everything fans out over SSE to a React/Vite/Zustand frontend split into three panels (compose, verify, run) with a live transcript and a trigger activity log. The reference hardware is an Adeept 5-DOF arm with skills like set_joint_angle, wave, grip_open/close, pan/tilt, and oled_text, and we bracket every routine with go_home so open-loop drift doesn't snowball across a chain.

Challenges we ran into

Most of our pain was keeping the model honest. LLMs love to invent skills the robot doesn't have, so we rewrote the system prompt at least six times and tightened the validator until things like doing a 360-degree turn get rejected before they ever leave the planner. Open-loop control was its own mess: chain multiple motions together and the gripper ends up somewhere it shouldn't, which is why every routine now resets to home between sub-steps. Our keyboard listener kept dying mid-session because pynput would silently exit on a stray modifier key, so we wrapped it in a supervisor that relaunches when it crashes. It also took a few iterations to realize that vague affective prompts ("look sad") are actually easier for the model than precise ones, as long as you let it be expressive instead of making it ask for clarification first.

Accomplishments that we're proud of

We have a real closed loop running on real hardware in a hackathon weekend, not a slideware demo. The conversational refinement works; saying "also tilt up at the end" edits the existing routine instead of nuking it, and that one feature is what made us think this is a product and not a script. The planner reliably composes 5-7 step expressive routines from two-word prompts, and the home-bracketing trick keeps them repeatable enough that we don't hold our breath every time we hit run. The UI also ended up looking like an actual product, which matters more than we expected; it's the difference between "school project" and "integration layer." And we found a market gap that holds up under questioning: OEMs won't build this because their margins are in hardware, foundation-model robotics companies are busy selling to OEMs, workflow tools can't talk to motors, and raw LLMs have no manifest to ground against. We're alone in that quadrant.

What we learned

The hard part of LLM-anything is no longer the LLM, it's the contract around it: the manifest, the validator, the execution trace, the input wiring. We learned to spend more of our time there. We also learned that vocabulary is positioning. Using "trigger" and "workflow" instead of inventing new words puts ReWire in the same mental bucket as Zapier and IFTTT, which is exactly where we want it. On the demo side, vague prompts impress people more than precise ones; precise ones look like keyword matching and vague ones look like reasoning. And writing the Pydantic schema before anything else meant four of us could work in parallel for two days without ever blocking each other, which was probably the single best decision we made.

What's next for ReWire

  • Visual input: gesture classification via camera so you can use your hands, body, and images as inputs
  • World awareness: Give the robot camera input for object detection, and scene understanding so it can react to what's actually happening around it, not just explicit triggers
  • Complex task interpretation: Right now ReWire interprets simple commands. Next is chaining: "clean up the desk" becomes a multi-step routine the model figures out on its own, from observation to execution
  • Universal robot onboarding: Upload your SDK docs and ReWire auto-generates the manifest and adapter for any hardware with no integration work required
  • Mobile app with on-device speech recognition as the primary input
  • Accessibility-first presets: voice-only mode, switch scanning, eye gaze as an input modality

Built With

  • claude
  • css
  • fastapi
  • framer
  • json
  • k2-think-v2
  • pydantic
  • pynput
  • pyserial
  • python
  • react
  • tanstack
  • uvicorn
  • vite
  • zustand
Share this project:

Updates