-
-
Take a break, take a good coffee
-
The pink world: where imagination turns coffee into poetry
-
When art feeds your mind, everything comes alive
-
When the modern world and information overload catch up with you
-
The gray world: physically present, mentally absent
-
She chose the pink world. Flowers bloomed under her feet.
"Read Me" — A Conscious Choice in the Age of Automation
What Inspired Me
In 2025, I see two worlds coexisting:
The Pink World — where people feed their minds with art, books, and imagination. Their morning coffee becomes a universe where the Mona Lisa smiles, Romeo and Juliet dance, and stormtroopers walk by. Everything is alive, colorful, poetic.
The Gray World — where eyes are glued to screens, "Breaking News" screams anxiety, and people are physically present but mentally absent. Everything is desaturated, dead, mechanical.
"Read Me" is an invitation to choose what we feed our minds.
The Creative Challenge
The question wasn't just what to create, but how to create it without compromising artistic integrity.
Everyone said: "Use ChatGPT, it takes 30 minutes."
I chose 60 hours instead. Here's why.
When AI Censors Art
ChatGPT rejected my initial prompt for the coffee scene because it included the word "sensual" — describing the suspended moment before the first sip. Not explicit. Not inappropriate. Just artistic.
This crystallized why I chose open source: commercial AI tools don't just limit technical capabilities — they limit creative vocabulary.
With ChatGPT, I'm subject to OpenAI's ever-changing content policies.
With open source, I control what's appropriate for my art.
The final image I created (Qwen + Flux) was tasteful and artistic. But I could only make it because I owned the tools.
Creative freedom isn't negotiable. ``
How I Built It
The Pipeline:
- Qwen for style training (150 hand-curated images per style, 20-minute expert descriptions each)
- Wan 2.2 for video generation (chosen over Kling AI for zero hallucinations)
- ComfyUI for workflow orchestration
- DaVinci Resolve for final editing and color grading
Three Distinct Visual Styles:
- Synesthetic Abstract (intro) — Pure colors, organic shapes
- Vibrant Pop Art (pink world) — Neon café, cultural icons
- Desaturated Documentary (gray world) — Black and white realism
The Color Language:
- Neon Pink = Rich inner life, imagination, conscious presence
- Desaturated Gray = Media anxiety, passive consumption, autopilot existence
Each chromatic choice is the message itself.
What I Learned
1. Expertise Beats Automation
AI is a tool. An extraordinary one. But what makes the difference is the human intention behind it.
If you let AI decide your aesthetic, you get AI's aesthetic. If you direct AI with expertise, you get YOUR aesthetic.
2. Open Source = True Freedom
With commercial tools, I rent a brush from OpenAI.
With open source, I forge my own brush.
I control my data. I own my style. I know exactly how everything is built. No black boxes. No hidden biases. No dependency on companies that can change rules overnight.
3. Quality Over Speed
150 intentional images beat 1000 random ones. Always.
I spent 60 hours building this workflow. ChatGPT could generate in 30 minutes.
But my workflow produces:
- Zero hallucinations
- 100% temporal coherence
- My unique visual signature
- Complete creative control
That's not time wasted. That's time invested.
The Challenges Faced
Challenge 1: Benchmarking Kling AI
Kling's demos looked incredible. I tested it against my workflow on the most complex shot (Kong + Stormtrooper scene).
Result: Kling's stormtrooper merged into Kong (hallucination). Mine stayed perfectly distinct.
Success rate: Kling 60%, My workflow 100%.
The "faster" tool wasn't actually faster in production.
Challenge 2: Style Training Precision
Each of my 150 images per style required expert captioning. Not "woman drinking coffee, colorful" — but detailed, precise descriptions like:
"Cinematic extreme close-up frontal view, lips slightly parted in suspended moment of anticipation, dark ceramic cup rim tenderly touching bottom lip creating point of contact, liquid visible inside cup catching warm golden light from behind, brilliant cyan turquoise backlight creating dramatic rim lighting..."
Every word = an artistic decision.
I initially did this manually (10 min/image × 150 = 25 hours per style).
Then I trained my own captioning agent — trained on MY vocabulary, MY cinematic references, MY way of describing light and composition.
Not replacing expertise. Amplifying it.
Challenge 3: Maintaining Coherence Across 14 Shots
With 3 radically different visual styles, maintaining narrative flow was crucial.
Wan 2.2's Mixture-of-Experts architecture saved me: separate experts for layout (high-noise) and details (low-noise) meant zero object drift, zero hallucinations, 100% coherence.
The Result
14 shots. 3 styles. 2min15. Zero compromise.
A film about conscious choice, built through conscious choices.
Not "technology is bad." But "what do you choose to consume?"
My character walks out of a café with a book — "READ ME" — the only splash of color in a black-and-white world. Flowers bloom under her feet.
She doesn't change the world. She doesn't save anyone.
She just makes a choice.
And maybe that's enough.
Why This Matters
We're drowning in AI Slop — content generated without intention, uniformizing the internet, erasing singularity.
"Read Me" proves another path exists:
Expertise + Open Source + Intention = Magic
Not the easy path. The free path.
This project is my answer to the question: "Does human expertise still matter in the age of automation?"
Spoiler: Yes. More than ever.
Built With
- comfyui
- customstyletrainingpipelines
- elevenlab
- fluxdev
- lora
- moe
- python
- qwen
- sunoai
- wan2.2
Log in or sign up for Devpost to join the conversation.