Inspiration

Robotics feels locked behind steep learning curves: kinematics jargon, complex tooling, and opaque AI workflows. We wanted a space where anyone could type “build a 6‑joint arm and scan the table” and watch it happen—combining generative vision (Imagen 3 textures) and an agentic reasoning loop inside a familiar Unity playground. Virturoid exists to flatten that barrier: natural language + autonomous robot behavior + instant visual feedback. In turn making robotics more accesible. Yet this same software answers another need within the field, the need for more data. Through Virturoid, large companies are able to utilise their own models for decision making, computer vision and more through the Robot Operating System, an open-source robotics middleware suite that is the industry standard, and train them on Virturoid's synthetic and simulated data.

What it does

Virturoid lets users:

  • Build custom robots from scratch (including basic 3D models, joints and even sensors).
  • Generate custom robot textures with Google Imagen 3 (procedural styling).
  • Build and command robots via a chat interface (natural language → structured tasks).
  • Run an autonomous ADK agent that plans multi-step goals (Perceive → Plan → Act → Monitor → Learn).
  • Utilise pre-built mdoels for decision making, movement and computer vision, allowing users to go from jsut an idea, to a robot, complete with trained models and 3D models.
  • Train their own mdoels on synthetic and simulated data.
  • Navigate scenes fluidly (WASD free camera).
  • Toggle UI cleanly to focus on simulation. It turns “simulate a scan and move to the pallet” into an executed sequence—no manual scripting required.

How we built it

  • Unity Robotics Pick & Place tutorial as the mechanical baseline, allowed for easier intigration with ROS.
  • Added AIChatManager to parse intents (build vs command) through a Gemini API.
  • Implemented RobotAgent with a finite state machine (Idle, Planning, Executing, Monitoring, Learning), through Google's agentic services.
  • Integrated Google Imagen 3 using OAuth token flow (PowerShell helper + fallback env variable).
  • Abstracted texture generation via ImagenClient + ImagenConfig (prompt → base64 → runtime material updates).
  • Utilized Microsoft Azure's OpenAI in conjuntion with Gemini to build simulated scenarios for robots to train within
  • Created a simplified UI toggle component (ChatUIEnhancer) for distraction-free operation.
  • Added a WASD free-look CameraController for exploration and debugging.
  • Documentation + guides (IMAGEN_SETUP.md, AGENT_USAGE_GUIDE.md) to reduce onboarding friction.

Challenges we ran into

  • Converting AI generated robots into ones compatible with ROS
  • Switching from API key assumptions to OAuth access tokens for Imagen 3 (handling token refresh and prioritization).
  • Preventing layout thrash in Unity’s UI system when styling the chat (ultimately simplified to pure visibility control).
  • Parsing ambiguous natural language commands (“scan and then move back” vs “move back while scanning”).
  • Maintaining agent memory compactly without bloating state (bounded queue design).
  • Keeping generated textures consistent across robot parts (prompt engineering + planned future Gemini Flash editing).

Accomplishments that we're proud of

  • Fully operational autonomous agent loop inside a standard tutorial scene.
  • Zero manual shader authoring: procedurally generated robot aesthetics via text prompts.
  • Fast onboarding: a newcomer can spawn and command a robot in under two minutes.
  • Clean separation of concerns (chat parsing, agent reasoning, texture generation, UI controls).
  • Extensible architecture ready for multi-agent scenarios and persistent cloud memory.

What we learned

  • Prompt engineering for robotics visuals differs from generic image generation—clarity about material, finish, and lighting matters.
  • Simulated autonomy benefits from explicit state transitions (easier debugging than free-form coroutines).
  • Minimizing UI automation sometimes improves reliability; over-styled dynamic layouts can fight Unity’s lifecycle.
  • OAuth ergonomics: dev scripts (like token fetchers) dramatically reduce integration friction.
  • Users prefer “goal statements” (“inspect zone then return”) over low-level step lists—so intent interpretation needs forgiving heuristics.

What's next for Virturoid4

  1. Media Mastery expansion: Gemini 2.5 Flash for image editing & texture refinement (consistency across parts).
  2. Agentic Intelligence upgrade: Vertex AI Agent Engine for persistent, cross-session memory and semantic recall.
  3. Multi-agent coordination: task negotiation (e.g., one robot scans while another sorts).
  4. Scenario marketplace: share prompt + agent goal “recipes” via a lightweight cloud backend.
  5. Curriculum mode: guided learning modules that gradually expose kinematics, sensing, and path planning concepts with AI assistance.
  6. Web streaming / thin client: remote access to Virturoid simulations for classrooms.
  7. Hardware bridge (future): exporting planned sequences to ROS for real-world execution.

If you want an even leaner “elevator pitch” or a slide-ready summary, just ask—I can generate that next.

Built With

Share this project:

Updates