Inspiration
Many people have had this experience: you hear a beautiful piece of music and wish you could play it, but struggle to find the sheet music online.
Music is deeply emotional and immersive, yet most of the time we only experience it through sound. We started wondering: what if music could become a world you can enter?
With the rise of World Models and generative 3D environments, we realized that music could be more than audio, it could become a spatial experience. Our idea was to transform music into a living world where rhythm, melody, and emotion shape the environment itself.
This project explores how music can become a place, not just something you hear.
What it does
Our project allows users—whether they are music lovers, independent creators, or musicians, to generate a 3D world from music.
Using audio as input, the system analyzes elements such as:
rhythm, tempo, frequency, emotional tone
These features are translated into parameters that generate a World Model environment using World Labs.
The result is an immersive world that visually represents the music. Users can explore the environment and experience the music spatially, almost like stepping inside the song.
How we built it
We built an end‑to‑end pipeline in Unity: Music input – User provides an audio file (path or file under StreamingAssets). OpenAI analysis – The audio is sent to OpenAI: first transcription (lyrics/speech), then a chat step that turns that (and the overall “vibe”) into a single, vivid world prompt (spatial description, atmosphere, lighting, objects). World generation – That prompt is sent to World Labs (Marble 0.1-mini), which generates a 3D world; we poll until it’s ready and then download the asset (e.g. .spz). Experience in Unity – The generated world is saved (e.g. under GaussianAssets) and can be viewed/explored with Gaussian Splatting in Unity. So: music file → OpenAI (transcribe + analyse) → one text prompt → World Labs → 3D world in Unity. We implemented this in C# (e.g. MusicToWorldPipeline.cs) so everything runs inside the game engine without external scripts at runtime.
Challenges we ran into
API response shape – World Labs doesn’t return a simple “download link”; assets live under response.assets.splats.spz_urls (and similar). We had to match the real API response and support multiple asset types (e.g. .spz, imagery). Transcription vs. “music understanding” – OpenAI’s API is built for speech (lyrics, dialogue). For instrumental or unclear audio, transcription can be empty. We added a fallback: when there’s no text, we still send a short description (“instrumental / music-only; infer mood”) so the system can generate a world from the idea of the music.
Accomplishments that we're proud of
We successfully built a working pipeline that turns music into a navigable world.
The project demonstrates that music can be transformed into an immersive spatial experience, allowing users to literally "enter" the atmosphere of a song.
We're especially proud that the system works not only for developers, but also has potential for musicians, artists, and independent creators who want new ways to express music visually.
What we learned
World models. Transcription APIs are speech‑centric; for music we had to treat “no text” as a valid case and still produce a world from high‑level mood.
Log in or sign up for Devpost to join the conversation.