"Read Me"

Take a break, take a good coffee
The pink world: where imagination turns coffee into poetry
When art feeds your mind, everything comes alive
When the modern world and information overload catch up with you
The gray world: physically present, mentally absent
She chose the pink world. Flowers bloomed under her feet.

"Read Me" — A Conscious Choice in the Age of Automation

What Inspired Me

In 2025, I see two worlds coexisting:

The Pink World — where people feed their minds with art, books, and imagination. Their morning coffee becomes a universe where the Mona Lisa smiles, Romeo and Juliet dance, and stormtroopers walk by. Everything is alive, colorful, poetic.

The Gray World — where eyes are glued to screens, "Breaking News" screams anxiety, and people are physically present but mentally absent. Everything is desaturated, dead, mechanical.

"Read Me" is an invitation to choose what we feed our minds.

The Creative Challenge

The question wasn't just what to create, but how to create it without compromising artistic integrity.

Everyone said: "Use ChatGPT, it takes 30 minutes."

I chose 60 hours instead. Here's why.

When AI Censors Art

ChatGPT rejected my initial prompt for the coffee scene because it included the word "sensual" — describing the suspended moment before the first sip. Not explicit. Not inappropriate. Just artistic.

This crystallized why I chose open source: commercial AI tools don't just limit technical capabilities — they limit creative vocabulary.

With ChatGPT, I'm subject to OpenAI's ever-changing content policies.

With open source, I control what's appropriate for my art.

The final image I created (Qwen + Flux) was tasteful and artistic. But I could only make it because I owned the tools.

Creative freedom isn't negotiable. ``

How I Built It

The Pipeline:

Qwen for style training (150 hand-curated images per style, 20-minute expert descriptions each)
Wan 2.2 for video generation (chosen over Kling AI for zero hallucinations)
ComfyUI for workflow orchestration
DaVinci Resolve for final editing and color grading

Three Distinct Visual Styles:

Synesthetic Abstract (intro) — Pure colors, organic shapes
Vibrant Pop Art (pink world) — Neon café, cultural icons
Desaturated Documentary (gray world) — Black and white realism

The Color Language:

Neon Pink = Rich inner life, imagination, conscious presence
Desaturated Gray = Media anxiety, passive consumption, autopilot existence

Each chromatic choice is the message itself.

What I Learned

1. Expertise Beats Automation

AI is a tool. An extraordinary one. But what makes the difference is the human intention behind it.

If you let AI decide your aesthetic, you get AI's aesthetic. If you direct AI with expertise, you get YOUR aesthetic.

2. Open Source = True Freedom

With commercial tools, I rent a brush from OpenAI.

With open source, I forge my own brush.

I control my data. I own my style. I know exactly how everything is built. No black boxes. No hidden biases. No dependency on companies that can change rules overnight.

3. Quality Over Speed

150 intentional images beat 1000 random ones. Always.

I spent 60 hours building this workflow. ChatGPT could generate in 30 minutes.

But my workflow produces:

Zero hallucinations
100% temporal coherence
My unique visual signature
Complete creative control

That's not time wasted. That's time invested.

The Challenges Faced

Challenge 1: Benchmarking Kling AI

Kling's demos looked incredible. I tested it against my workflow on the most complex shot (Kong + Stormtrooper scene).

Result: Kling's stormtrooper merged into Kong (hallucination). Mine stayed perfectly distinct.

Success rate: Kling 60%, My workflow 100%.

The "faster" tool wasn't actually faster in production.

Challenge 2: Style Training Precision

Each of my 150 images per style required expert captioning. Not "woman drinking coffee, colorful" — but detailed, precise descriptions like:

"Cinematic extreme close-up frontal view, lips slightly parted in suspended moment of anticipation, dark ceramic cup rim tenderly touching bottom lip creating point of contact, liquid visible inside cup catching warm golden light from behind, brilliant cyan turquoise backlight creating dramatic rim lighting..."

Every word = an artistic decision.

I initially did this manually (10 min/image × 150 = 25 hours per style).

Then I trained my own captioning agent — trained on MY vocabulary, MY cinematic references, MY way of describing light and composition.

Not replacing expertise. Amplifying it.

Challenge 3: Maintaining Coherence Across 14 Shots

With 3 radically different visual styles, maintaining narrative flow was crucial.

Wan 2.2's Mixture-of-Experts architecture saved me: separate experts for layout (high-noise) and details (low-noise) meant zero object drift, zero hallucinations, 100% coherence.

The Result

14 shots. 3 styles. 2min15. Zero compromise.

A film about conscious choice, built through conscious choices.

Not "technology is bad." But "what do you choose to consume?"

My character walks out of a café with a book — "READ ME" — the only splash of color in a black-and-white world. Flowers bloom under her feet.

She doesn't change the world. She doesn't save anyone.

She just makes a choice.

And maybe that's enough.

Why This Matters

We're drowning in AI Slop — content generated without intention, uniformizing the internet, erasing singularity.

"Read Me" proves another path exists:

Expertise + Open Source + Intention = Magic

Not the easy path. The free path.

This project is my answer to the question: "Does human expertise still matter in the age of automation?"

Spoiler: Yes. More than ever.

Built With

comfyui
customstyletrainingpipelines
elevenlab
fluxdev
lora
moe
python
qwen
sunoai
wan2.2

Submitted to

Chroma Awards: AI Film, Music Videos, and Games

Created by

TEAM OF 4
Lydie Catalano — Creator, Director & AI Artist
- Concept & narrative development
- Style training direction (150 images curated & expertly captioned)
- All artistic decisions (color grading, composition, editing)
- Production & post-production
- Visual aesthetic design (3 distinct universes)

Technical Development Team (3 developers)
- Custom workflow architecture & deployment
- ComfyUI pipeline optimization
- LoRA training infrastructure setup
- Technical implementation & troubleshooting
- Workflow automation for efficient production

The creative vision and all artistic choices were made by Lydie.
The technical team provided the infrastructure that brought that
vision to life through open-source tools (Qwen, Wan 2.2, Flux, ComfyUI).

Lydie Catalano

Updates

Lydie Catalano started this project — Nov 17, 2025 03:25 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.