We built Sisyphus: an automatic, fully mutable, interactive 3d world generator, which acts as RL environments for robots.

Through a 3 pronged approach, we demonstrate how we preserve all physics and how we developed an architecture that can be used to train or eval any VLA model with a fraction of the compute and teleoperator data.

(1) an RL agent environment. We develop our own VLA model that can navigate the world. Here, we specifically use it to clean and re organize tables across the Neo office.

(2) An eval guide for how VLAs can be evaluated on a diverse set of OOB constraints, tasks, and environments.

(3) an interactive 3d world for humans to explore any 2d image using Meta VR and apple iphones.

Some challenges we ran into were: 1) physics is hard sometimes. 2) VLAs are annoying and fragile. 3) Sometimes the agent's idea of cleaning was throwing everything off the desk :(

Future plans include fleshing out more robust RL in environments for the agent and evaluating other VLAs.

Share this project:

Updates