Inspiration

As artists working with AI, we are fascinated by the idea of the fresco: a vast, narrative image where each region has its own story but all must belong to a coherent whole. Traditional generative models produce impressive single images, but we wanted to see whether FLUX.1 Kontext [dev] could be pushed further into a workflow for constructing large, editable, and self-consistent digital frescoes.


What it does

Sinopia is a VLM-guided multidiffusion framework for digital fresco creation, powered by FLUX.1 Kontext [dev].
It transforms FLUX Kontext into a tool for structured artistic composition for large-scale image with multiple scenes, with three steps:

  1. Composition Generation: Kontext builds the fresco through regional prompting and multidiffusion, guided by masks and prompts.
  2. Self-Correction: A Vision–Language Model analyzes the fresco against its prompts, flags inconsistencies caused by multidiffusion, and sends those regions back into Kontext for regeneration. Kontext becomes not only the generator, but also the editor of its own work.
  3. Harmonization: ControlNet and IP-Adapter unify the whole fresco into a coherent style.

We also developed a Sinopia GUI: a browser-based interface where artists can draw masks, assign prompts, and configure generation settings. This lowers the barrier to experimenting with large-scale regional prompting.


How we built it

  • Developed a three-step Python pipeline.
  • Used FLUX.1 Kontext [dev] as the core model in Step 1. We coded the FluxPanoramaRegionalPipeline to enable multidiffusion and regional prompting across large canvases.
  • Integrated a Vision–Language Model (VLM) via OpenAI API in Step 2 to automatically detect inconsistencies between the generated fresco and its prompts, then re-fed those zones back into FLUX Kontext for correction.
  • Designed a GUI for mask drawing, prompt assignment, and generation controls, making regional prompting in Kontext accessible to artists without coding.
  • Applied ControlNet edge detection + IP-Adapter style transfer in Step 3 for harmonization, supporting both global and per-region styles.

Challenges we ran into

  • Making large-scale generation efficient while keeping outputs coherent.
  • Designing a feedback loop where VLM outputs could directly drive Kontext regeneration.

Accomplishments that we're proud of

  • Building a self-correcting fresco pipeline where FLUX Kontext iteratively improves its own outputs.
  • Designing the Sinopia GUI, turning a complex pipeline into an intuitive creation tool for artists.
  • Demonstrating how Kontext and VLMs can be used together to improve prompt adherence inside a generative workflow.
  • Pushing Kontext into a new creative use case: not just image generation, but large-scale, editable compositions with multiple scenes.

What we learned

  • FLUX Kontext is flexible enough to act as both a creator and editor in a multi-stage pipeline.
  • VLMs can play a role in bridging text–image alignment at scale.
  • Making tools artist-friendly (via GUI) is as important as pushing technical boundaries.
  • We created an artwork at Grand Palais Immersif for an exhibition on AI Arts in Paris with our method.
  • Here is a full technical report on Sinopia : link to pdf

What's next for Sinopia

  • Large-scale murals: pushing the pipeline to generate 50k+ pixel images for physical installations (tiles, textiles, façades). We are aiming at creating a real-life fresco in Paris, next June.
  • Dynamic frescoes: extending the pipeline to video for evolving, narrative-driven compositions.
Share this project:

Updates