ArchitectureDiagram

Nebula — Interactive Multimodal Storytelling

Inspiration

We started with a simple question: what if a story could react to you?

Modern AI models can write text, generate images, and create video, but most systems treat these capabilities separately. We wanted to explore what happens when they are combined into a single interactive loop.

We took inspiration from classic branching narrative games like Zork and 80 Days, where each player choice pushes the story in a new direction. Our goal was to recreate that experience using generative AI so that every branch could produce new scenes, visuals, and cinematic moments in real time.

The result is Nebula, an AI-powered storytelling engine where every decision creates a unique world.

What We Built

Nebula is a branching story system where each player choice generates a new scene dynamically.

Every scene includes:

A narrative written by Gemini
A set of player choices that expand the story
Illustrated scene images generated with Imagen
Optional cinematic clips generated with Veo

As players move through the story, Nebula builds a growing tree of scenes. Each path is unique to that player’s choices, creating a personalized narrative experience.

Key Ideas

Multimodal storytelling

Nebula brings together three types of generation in a single gameplay loop:

Text for narrative scenes and story progression
Images for visualizing scenes and player choices
Video for cinematic moments in the story

Narrative content is generated first so the player can begin reading immediately, while images and video render in the background.

Maintaining story continuity

To keep the story coherent as it branches, each scene stores a short summary describing its narrative state.

When a new scene is generated, the model receives the chain of summaries from the beginning of the story to the current point. This keeps the narrative consistent while keeping prompts small and efficient.

Parallel media generation

Visual assets generate asynchronously after the scene text is returned. This allows players to continue reading while images and video appear progressively, reducing perceived latency.