Nebula — Interactive Multimodal Storytelling
Inspiration
We started with a simple question: what if a story could react to you?
Modern AI models can write text, generate images, and create video, but most systems treat these capabilities separately. We wanted to explore what happens when they are combined into a single interactive loop.
We took inspiration from classic branching narrative games like Zork and 80 Days, where each player choice pushes the story in a new direction. Our goal was to recreate that experience using generative AI so that every branch could produce new scenes, visuals, and cinematic moments in real time.
The result is Nebula, an AI-powered storytelling engine where every decision creates a unique world.
What We Built
Nebula is a branching story system where each player choice generates a new scene dynamically.
Every scene includes:
- A narrative written by Gemini
- A set of player choices that expand the story
- Illustrated scene images generated with Imagen
- Optional cinematic clips generated with Veo
As players move through the story, Nebula builds a growing tree of scenes. Each path is unique to that player’s choices, creating a personalized narrative experience.
Key Ideas
Multimodal storytelling
Nebula brings together three types of generation in a single gameplay loop:
- Text for narrative scenes and story progression
- Images for visualizing scenes and player choices
- Video for cinematic moments in the story
Narrative content is generated first so the player can begin reading immediately, while images and video render in the background.
Maintaining story continuity
To keep the story coherent as it branches, each scene stores a short summary describing its narrative state.
When a new scene is generated, the model receives the chain of summaries from the beginning of the story to the current point. This keeps the narrative consistent while keeping prompts small and efficient.
Parallel media generation
Visual assets generate asynchronously after the scene text is returned. This allows players to continue reading while images and video appear progressively, reducing perceived latency.
Tech Stack
Models
- Gemini 2.5 Flash for narrative generation
- Imagen 4.0 for scene and choice images
- Veo 2.0 for cinematic video
Infrastructure
- FastAPI backend deployed on Cloud Run
- Firestore for storing the story graph
- Google Cloud Storage for generated media
- Firebase Auth for user authentication
Frontend
- React with Zustand for story state
- React Flow for visualizing story branches
- Framer Motion for scene transitions
- Vite for development and bundling
Built With
- cloud-run
- fastapi
- firebase-auth
- firestore
- framer-motion
- gemini-2.5-flash
- google-cloud
- imagen-4.0
- python
- react
- tailwind-css
- typescript
- veo
- vite
- zustand
Log in or sign up for Devpost to join the conversation.