Inspiration

I'm a digital artist, and I use Gemini's image generation models - particularly Nano Banana 2 and Pro - regularly in my own creative workflows. What struck me was how genuinely good they are at producing impressionist paintings, with visible brushstrokes, atmospheric light, broken color that feels optical rather than blended. It didn't have the usual "AI art" aesthetic. It actually looked like something out of Procreate or Photoshop.

That got me thinking, what if I could build an interactive experience around that capability? Beyond just a chatbot that happens to generate images, but a storybook where the art and the interactive narrative are the entire point. Something you'd actually want to sit with and explore, choosing where the story goes next and wondering where Luna will take you.

What it does

Luna is an interactive visual novel storybook. You give it a theme, something like "a shallow pool among jagged rocks at sunset", and it produces a 5-page illustrated story with impressionist paintings, narrative prose, and spoken audio. Each page streams to your browser as it's generated, painting by painting. After page 5, you choose a direction (or write your own), and the story branches into a new chapter with full continuity of title, mood, palette, and narrative.

The experience is deliberately minimalist - Cormorant Garamond serif, justified text, vignette overlays, a silvery book-page palette - so the art and the words stay in focus. Once your story is complete, you can export it as a formatted PDF, a styled HTML file, or a ZIP with separate image files.

How I built it

Luna orchestrates three Gemini models through a single SSE stream:

  1. Gemini 3.1 Pro produces a structured story plan — title, mood, color palette, character descriptions, 5 scene outlines, and branching choices — returned as typed JSON.
  2. Gemini 3.1 Flash Image generates each page one at a time using interleaved text + image output: 1–2 sentences of narration paired with an impressionist painting. The first page's painting is fed back as a style reference for pages 2–5 to maintain visual coherence.
  3. Gemini 2.5 Flash TTS (Aoede voice) narrates the full section aloud, with PCM-to-WAV conversion server-side.

The frontend is Next.js 16 with React 19 and Tailwind CSS 4. Pages render incrementally via SSE — you see each painting appear as it's ready, not all at once. The backend runs on Google Cloud Run with automated deployment via a shell script (deploy.sh).

I built in per-page error recovery so if a generation fails mid-story, the user can retry from exactly the failed page without losing progress. Rate limiting (10 generations/hour/IP) and strict safety filters (BLOCK_LOW_AND_ABOVE across all harm categories) are applied to every Gemini call.

Challenges I faced

The hardest problem was visual consistency across pages. Impressionist painting is inherently loose - brushwork, color mixing, and character rendering vary between generations. As a chapter progresses, specific styles can drift.

My first approach was feeding back each previous image as a style reference. This helped with palette and brushstroke consistency, but introduced new problems: duplicated characters, fused anatomy, and strange overlay artifacts where the model tried to blend the reference into the new scene rather than just match its style.

The solution was a middle ground — I feed back only the first page's image as a style anchor for the entire chapter, combined with explicit prompt engineering that tells the model to treat it strictly as a style reference without copying or overlaying any part of it. Character consistency is reinforced through detailed appearance descriptions in the story plan, carried through every page prompt. It's not perfect, as over long multi-chapter stories you'll still see drift, but it produces a coherent visual experience within each chapter.

What I learned

  • Gemini's interleaved text + image output is genuinely powerful for creative applications - the model can reason about what it's painting and what it's writing at the same time, which produces more coherent pairings than generating them separately.
  • Prompt engineering for visual consistency is more nuanced than I expected. The line between "use this as a reference" and "copy this image" is thin, and the wording matters significantly.
  • Streaming per-page via SSE makes a huge difference in perceived quality — watching each painting appear feels like turning pages in a real book, rather than waiting longer for the whole story to load.

What's next

I plan to keep refining Luna after the hackathon. It's genuinely fun to use - selecting a direction and leaving the rest to Luna, wondering where she'll take you next. Some directions I'm exploring: multi-chapter memory so style references carry across branching choices, user-adjustable art styles beyond impressionism, and a gallery mode for revisiting and sharing stories.

Share this project:

Updates