Inspiration

Just looking up to the sky, when im watching clouds for chill, im wondering how AI might imagine this piece of cloud

What it does

Cloud Imaginator is a multi-modal creative suite that:

  • Sky Lens (Live Co-pilot): Uses the Gemini Live API to converse with you in real-time as you point your camera at the sky. It identifies shapes, agrees with your sightings, and builds creative context.
  • Intelligent Discovery: Automatically detects distinct "inhabitants" (cloud segments) using Gemini 3 Pro's spatial reasoning.
  • Solid Mask Engine: A custom CV pipeline that extracts high-fidelity stencils from soft clouds.
  • Physical Transformation: Reimagines these stencils as solid entities (fur, metal, scales) using Gemini 2.5 Flash Image, perfectly aligned with the original sky's lighting.
  • Atmospheric Narrator: Generates a unique 80-word ancient myth for your creation and narrates it using a warm, storytelling TTS voice.

How we built it

We leveraged the full Gemini 3 ecosystem:

  • gemini-2.5-flash-native-audio powers the Sky Lens for sub-second conversational latency.
  • gemini-3-pro-preview handles complex spatial detection of clouds and high-quality lore generation.
  • gemini-3-flash-preview performs rapid visual interpretation of silhouettes and geometric analysis.
  • gemini-2.5-flash-image acts as the rendering engine, using multi-part prompts (Original + Mask + Instruction) for zero-bleed subject replacement.
  • gemini-2.5-flash-preview-tts provides the final layer of immersion with high-quality PCM audio output.
  • The frontend is built with React, Tailwind CSS, and utilizes the Web Audio API for raw PCM stream management.

Challenges we ran into

I got problem where generative image not matching the cloud's shape, the new image generated not well placed on to the cloud; AI studio wants to rewrite the entire file each time and get different behavior each time

i've improved the result by splitting the app into smaller components and test little by little the prompt for generation

Accomplishments that we're proud of

  • The "Sky Lens" Experience: Creating a truly conversational co-pilot that feels like a shared creative partner rather than a tool.
  • The Masking Engine: Developing a saliency-based CV algorithm that works on low-contrast clouds where standard Canny or Sobel methods fail.
  • Cohesive Pipeline: Successfully chaining five different Gemini modalities into a single, seamless user experience that flows from observation to finished art.

What we learned

I've discovered that the "Thinking" capabilities of the Gemini 3 series are game-changing for creative spatial tasks.

What's next for Cloud imaginator

  • Community Sky-Lore: A global map (using Gemini Maps grounding) where users can see what others have "discovered" in the clouds at their location.
  • AR View: A persistent AR mode where your generated cloud creatures stay pinned to the sky coordinates as you move your phone.

Built With

Share this project:

Updates