Inspiration

Many people have had this experience: you hear a beautiful piece of music and wish you could play it, but struggle to find the sheet music online.

Music is deeply emotional and immersive, yet most of the time we only experience it through sound. We started wondering: what if music could become a world you can enter?

With the rise of World Models and generative 3D environments, we realized that music could be more than audio, it could become a spatial experience. Our idea was to transform music into a living world where rhythm, melody, and emotion shape the environment itself.

This project explores how music can become a place, not just something you hear.

What it does

Our project allows users—whether they are music lovers, independent creators, or musicians, to generate a 3D world from music.

Using audio as input, the system analyzes elements such as:

rhythm, tempo, frequency, emotional tone

These features are translated into parameters that generate a World Model environment using World Labs.

The result is an immersive world that visually represents the music. Users can explore the environment and experience the music spatially, almost like stepping inside the song.

How we built it

We built an end‑to‑end pipeline in Unity: Music input – User provides an audio file (path or file under StreamingAssets). OpenAI analysis – The audio is sent to OpenAI: first transcription (lyrics/speech), then a chat step that turns that (and the overall “vibe”) into a single, vivid world prompt (spatial description, atmosphere, lighting, objects). World generation – That prompt is sent to World Labs (Marble 0.1-mini), which generates a 3D world; we poll until it’s ready and then download the asset (e.g. .spz). Experience in Unity – The generated world is saved (e.g. under GaussianAssets) and can be viewed/explored with Gaussian Splatting in Unity. So: music file → OpenAI (transcribe + analyse) → one text prompt → World Labs → 3D world in Unity. We implemented this in C# (e.g. MusicToWorldPipeline.cs) so everything runs inside the game engine without external scripts at runtime.

Challenges we ran into

API response shape – World Labs doesn’t return a simple “download link”; assets live under response.assets.splats.spz_urls (and similar). We had to match the real API response and support multiple asset types (e.g. .spz, imagery). Transcription vs. “music understanding” – OpenAI’s API is built for speech (lyrics, dialogue). For instrumental or unclear audio, transcription can be empty. We added a fallback: when there’s no text, we still send a short description (“instrumental / music-only; infer mood”) so the system can generate a world from the idea of the music.

Accomplishments that we're proud of

We successfully built a working pipeline that turns music into a navigable world.

The project demonstrates that music can be transformed into an immersive spatial experience, allowing users to literally "enter" the atmosphere of a song.

We're especially proud that the system works not only for developers, but also has potential for musicians, artists, and independent creators who want new ways to express music visually.

What we learned

World models. Transcription APIs are speech‑centric; for music we had to treat “no text” as a valid case and still produce a world from high‑level mood.

What's next for The World of Music

Built With

Share this project:

Updates