Inspiration

The world is on the cusp of a physical AI revolution. Autonomous vehicles like those built by Waymo require millions of hours of real-world driving data — yet the most safety-critical edge cases (a child chasing a ball into the street at dusk, an unexpected rockslide on a mountain highway) are precisely the scenarios hardest to capture by simply driving more miles. The cost of collecting diverse, high-quality, real-world data at the scale needed to train the next generation of robots, autonomous vehicles, and embodied AI agents is becoming one of the fundamental bottlenecks of the field.

We were also inspired by World Labs' Marble Model — the vision of large world models that can synthesize rich, physically plausible 3D environments from a single image. That capability, if made accessible and composable, could fundamentally change how we gather training data for physical AI, how filmmakers design sets, and how game studios world-build.

MarbleOS is our answer to that challenge: a spatial computing platform that democratizes the creation of navigable 3D worlds from any input — a text prompt, a photograph, or even a real-world address — and packages them in a way that is immediately useful to robotics engineers, creative professionals, and everyday users alike. We challenged ourselves to build an open-source version of Marble in 24 Hours, running on consumer hardware.

What it does

MarbleOS is an agentic Multimodal World Model Ecosystem — a visionOS-inspired spatial operating system where the fundamental unit of content is not a file or a webpage, but an explorable 3D world.

From a single image — a family vacation photo, a product shot, a satellite thumbnail — MarbleOS reconstructs a full navigable scene as a 3D Gaussian Splat. Users can:

  • Upload any image: and receive a photorealistic, navigable 3D world in seconds
  • Enter a text prompt: and generate an environment from pure imagination
  • Drop in a real-world address: and step inside a 3D reconstruction of that location via Street View integration
  • Browse a personal Gallery: of all generated worlds, each with a preview video and a live 3D viewer
  • Export worlds: as industry-standard .ply files for use in downstream workflows

For robotics companies, this means synthetic training environments on demand — edge-case scenarios that would take months of careful real-world driving to capture can now be generated in minutes. For filmmakers and game studios, it means virtual location scouting and environment concepting at unprecedented speed. For individuals, it means turning a favourite memory into a place you can walk around in.

How we built it

MarbleOS is organized into three tightly integrated layers:

Frontend — Next.js + visionOS design system Built on Next.js 16 designed to mimic the VisionOS design system glassmorphism, depth layering, and spring-based animations.

Backend — FastAPI + Apple SHARP The inference layer is a FastAPI server that accepts image uploads, routes them to Apple's SHARP model (Spatial High-fidelity Adaptive Rendering Pipeline) via the Gradio client interface, and returns a .ply Gaussian Splat file alongside a preview .mp4 flythrough.

Challenges we ran into

  1. World quality out of the box: Single-image 3D reconstruction is a hard problem. The raw output of the SHARP model is impressive but imperfect. Turning a noisy Gaussian Splat into a clean, usable world requires post-processing steps we are still developing.

  2. Implementing a 3D world viewer in the browser: Rendering Gaussian Splats in real time is computationally intensive and, until very recently, had no mature browser-native solution.

Accomplishments that we're proud of

  1. Any image becomes a world. We can take virtually any photograph — a family portrait, a travel snapshot, a concept sketch — and reconstruct it as a navigable 3D Gaussian Splat with a flythrough video, all in a single pipeline invocation.

  2. Multimodal entry points: Text prompts, images, and real-world geographic addresses all flow into the same world-generation pipeline, unified under one spatial OS shell.

  3. A spatial OS aesthetic that feels native: The visionOS-inspired interface — glassmorphism panels, spring animations, depth layering, Ornament tab bars — makes interacting with 3D worlds feel as natural as browsing a photo library.

  4. End-to-end open source: The entire stack — from the FastAPI backend to the Next.js frontend — is open source under AGPL-3.0.

What we learned

  • Gaussian Splats are the right primitive for world models. They are compact, photorealistic, and renderable in real time in the browser — a uniquely powerful combination for a platform meant to generate worlds at scale.

  • The bottleneck for physical AI is not models, it is data infrastructure: The harder problem is not can we generate a world but can we generate the right world, at scale, in the right format, for the right downstream consumer. MarbleOS is an early exploration of that infrastructure layer.

What's next for MarbleOS

  1. Higher-fidelity world generation: We want to integrate more powerful world models as they become available, and add post-processing pipelines (mesh cleaning, inpainting, scene completion) to close the gap between raw model output and production-ready environments.

  2. Interactive world editing via prompts: Users should be able to step inside a generated world and say "add snow", "make it night", or "place a robot in the corner" — iterative, language-driven world editing on top of the base generation.

  3. Native plugins for 3D engines: Export connectors to Unreal Engine and Unity 3D so that generated worlds can flow directly into the tools game studios and filmmakers already use.

  4. Robotics simulation integrations: First-class connectors to NVIDIA Isaac Sim and MuJoCo so that robotics companies can generate synthetic training environments and immediately run physics simulations inside them.

  5. Enhanced physics: Generated worlds today are visually rich but physically inert. We want to layer in physics simulation so that objects in the scene behave realistically — a prerequisite for using MarbleOS-generated worlds as robot training environments.

  6. Agentic world pipelines: Long-term, we envision MarbleOS as a platform where AI agents autonomously generate, curate, and label synthetic worlds to order — a data flywheel for physical AI that can operate at a scale no human-driven data collection effort could match.

Built With

Share this project:

Updates