Inspiration
Just looking up to the sky, when im watching clouds for chill, im wondering how AI might imagine this piece of cloud
What it does
Cloud Imaginator is a multi-modal creative suite that:
- Sky Lens (Live Co-pilot): Uses the Gemini Live API to converse with you in real-time as you point your camera at the sky. It identifies shapes, agrees with your sightings, and builds creative context.
- Intelligent Discovery: Automatically detects distinct "inhabitants" (cloud segments) using Gemini 3 Pro's spatial reasoning.
- Solid Mask Engine: A custom CV pipeline that extracts high-fidelity stencils from soft clouds.
- Physical Transformation: Reimagines these stencils as solid entities (fur, metal, scales) using Gemini 2.5 Flash Image, perfectly aligned with the original sky's lighting.
- Atmospheric Narrator: Generates a unique 80-word ancient myth for your creation and narrates it using a warm, storytelling TTS voice.
How we built it
We leveraged the full Gemini 3 ecosystem:
gemini-2.5-flash-native-audiopowers the Sky Lens for sub-second conversational latency.gemini-3-pro-previewhandles complex spatial detection of clouds and high-quality lore generation.gemini-3-flash-previewperforms rapid visual interpretation of silhouettes and geometric analysis.gemini-2.5-flash-imageacts as the rendering engine, using multi-part prompts (Original + Mask + Instruction) for zero-bleed subject replacement.gemini-2.5-flash-preview-ttsprovides the final layer of immersion with high-quality PCM audio output.- The frontend is built with React, Tailwind CSS, and utilizes the Web Audio API for raw PCM stream management.
Challenges we ran into
I got problem where generative image not matching the cloud's shape, the new image generated not well placed on to the cloud; AI studio wants to rewrite the entire file each time and get different behavior each time
i've improved the result by splitting the app into smaller components and test little by little the prompt for generation
Accomplishments that we're proud of
- The "Sky Lens" Experience: Creating a truly conversational co-pilot that feels like a shared creative partner rather than a tool.
- The Masking Engine: Developing a saliency-based CV algorithm that works on low-contrast clouds where standard Canny or Sobel methods fail.
- Cohesive Pipeline: Successfully chaining five different Gemini modalities into a single, seamless user experience that flows from observation to finished art.
What we learned
I've discovered that the "Thinking" capabilities of the Gemini 3 series are game-changing for creative spatial tasks.
What's next for Cloud imaginator
- Community Sky-Lore: A global map (using Gemini Maps grounding) where users can see what others have "discovered" in the clouds at their location.
- AR View: A persistent AR mode where your generated cloud creatures stay pinned to the sky coordinates as you move your phone.
Built With
- react
- tailwind
Log in or sign up for Devpost to join the conversation.