The Scale: A Surrealist Music Video About America's Weight Loss Industrial Complex
Inspiration
I've struggled with my weight my entire life. While I don't tie my self-worth to how I look, it does greatly impact my life because it impacts my health. In this last year of my 30s, I'm digging into why this has been such a lifelong challenge and actually trying to fix my health.
Why can I eat dairy in Europe without any issues, but in America I have immediate negative reactions? Why are 73% of Americans overweight or obese if this is just a matter of personal willpower? Why do the same corporations that manufacture hyperpalatable, addictive processed foods also own the diet programs and pharmaceutical solutions?
This isn't a bug. It's a feature. It's a closed-loop profit system:
- Government subsidizes addictive ingredients (corn, soy, sugar)
- Food scientists engineer "bliss point" products designed to override satiety
- Marketing normalizes overconsumption
- Medical establishment pathologizes the predictable result (obesity)
- Pharmaceutical companies sell expensive treatments
- We blame ourselves and buy more solutions
- Repeat
The weight loss industry is worth $4.5 trillion globally. These companies don't profit from our success. They profit from our failure and our shame.
I wanted to create something that captured both the deeply personal pain of this struggle and the systemic forces that create it. A music video felt like the perfect medium: emotional enough to resonate with anyone who's felt trapped in this cycle, but sharp enough to name what's really happening.
What I Learned
About the Issue:
- The same parent companies often own both junk food brands and weight loss pharmaceutical divisions
- Food scientists literally engineer addiction using precise salt-sugar-fat ratios called "bliss point"
- Government agricultural subsidies overwhelmingly favor ingredients that contribute to obesity
- The revolving door between FDA officials and pharmaceutical company board positions is well-documented
- This isn't conspiracy theory, it's just capitalism doing what capitalism does: optimizing for profit
About AI Filmmaking:
- Surrealism is AI's superpower: Rather than trying to create photorealistic narrative scenes (where AI often struggles with consistency), leaning into surreal, metaphorical imagery played to AI's strengths. The dreamlike quality became an asset, not a limitation.
- Batch generation strategy matters: Generating all similar shots together (all hands, all corporate imagery, all performance shots) using consistent parameters created better visual cohesion than generating randomly.
- Character consistency is possible but requires planning: Using Midjourney's character reference feature and maintaining the same seed values across shots helped keep the protagonist recognizable throughout.
- Motion adds emotional weight: Even simple animations (scales cracking, hands crumbling to dust, numbers floating) transformed static images into visceral moments.
- Lip sync technology has arrived: Kling Speak made the performance shots feel genuinely human. This was the breakthrough that let me create an authentic music video rather than just pretty pictures set to music.
How I Built It
1. Song Generation (Producer.ai)
I crafted a detailed prompt specifying:
- Genre: Alt-pop/dark pop with emotional depth
- Structure: Verse 1 (personal struggle) → Verse 2 (investigation/revelation) → Bridge (breaking point) → Liberation
- Tone: Intimate and vulnerable building to powerful and defiant
- Lyrical approach: Concrete imagery, specific details (weight loss drugs, scales, receipts), shift from "I" to "they" to "we"
I generated several variations with Producer.ai. I selected the version that best balanced catchiness with emotional rawness.
2. Visual Concept Development
Rather than literal storytelling, I chose surrealism to represent emotional truth:
- Obsession: Scales multiplying and orbiting like planets, woman praying to scale display like religious icon
- Investigation: Pulling thread from food label revealing massive corporate web, products bleeding sugar like wounds
- System Revealed: Giant corporate hands force-feeding then selling pills, octopus creature with tentacles holding different products
- Liberation: Scale cracking like ice beneath lifting foot, walking away from monument-sized scale, golden cage with open door
Each lyric line received its own metaphorical visual interpretation: 38 core shots total.
3. Image Generation (Midjourney v6)
Character consistency strategy:
- Created master reference prompt for protagonist (42-year-old woman, specific physical details, consistent wardrobe)
- Generated multiple character shots first using same seed
- Used --cref parameter to reference best shots across all performance images
- Kept lighting descriptors consistent ("natural window light," "dramatic uplighting")
Batch generation approach:
- Generated all surreal object shots together (scales, pills, corporate imagery)
- Generated all performance closeups together
- Generated all wide/establishing shots together
- Used consistent style parameters within each batch
Prompt engineering for motion:
- Left 10% padding around frame edges for camera movement
- Specified "slow motion," "floating," "crumbling" in prompts for shots I knew would be animated
- Generated higher resolution for shots requiring zoom/pan
Total images generated: ~150 (selected 45 for final edit)
4. Lip Sync Performance (Kling Speak / Higgsfield)
- Selected 8 key performance shots (vulnerable verse, powerful chorus, angry pre-chorus, triumphant finale)
- Uploaded protagonist face images and audio stems
- Kling Speak's AI generated realistic lip movements synced to vocals
- This was the game-changer: turned static images into genuine performance
5. Motion Graphics & Animation (Runway ML, Pika)
Animated shots:
- Scales multiplying and orbiting (Runway Gen-3)
- Drawer opening with light orbs floating (Pika)
- Woman falling through impossible space (Runway)
- Corporate hands crumbling to dust (Pika)
- Scale cracking beneath lifting foot (Runway)
- Pills transforming into butterflies (Pika)
Static shots with motion:
- Ken Burns effects (slow zoom/pan) on portraits and wide shots
- Subtle breathing/wind effects added to static performance shots
6. Editing & Post-Production (CapCut)
Timeline structure:
- Beat-synced cuts (every shot change aligned to musical beats or lyric phrases)
- Color grading evolution: sterile blues (verse 1) → sickly greens (verse 2) → warm golds (finale)
- Aspect ratio shifts: 16:9 for narrative, 9:16 for intimate moments, 2.35:1 for epic scale
- Transition shots: Pills burning, receipts floating like leaves, logos melting
Audio mixing:
- Balanced vocals against surround sound design
- Added subtle ambient textures (corporate hum, breathing, environmental sounds)
- Created space for silence in key revelatory moments
Text overlays:
- End statistics: "73% of Americans are overweight or obese. The weight loss industry: $4.5 trillion. You do the math."
- Minimal, impactful, documentary-style typography
Total edit time: ~40 hours across 2 weeks
Challenges I Faced
Challenge 1: Character Consistency Across 45+ Shots
Problem: Early generation attempts created a protagonist who looked different in every shot—different face, different hair, completely breaking immersion.
Solution:
- Created a detailed master character prompt with specific physical traits
- Generated 20 variations, selected the most consistent face as reference
- Used Midjourney's --cref feature religiously on every subsequent shot
- Kept wardrobe ultra-simple (black clothing) to reduce variables
- Accepted that perfect consistency is impossible; aimed for "recognizably the same person"
Challenge 2: Surrealism vs. Coherence
Problem: How do you create visually surreal imagery that still tells a clear story? Too literal and it's boring; too abstract and viewers get lost.
Solution:
- Every surreal image had to answer: "What emotion or concept does this represent?"
- Created a visual logic system: corporate = cold/mechanical, liberation = organic/warm, truth = light breaking through
- Tested early cuts with friends: "What do you think is happening here?" If they couldn't articulate it, I revised
- Used familiar objects in unfamiliar ways (scales, pills, hands) rather than completely abstract shapes
Challenge 3: Lip Sync Uncanny Valley
Problem: Early attempts at lip sync looked creepy or obviously fake, which destroyed the emotional authenticity of performance shots.
Solution:
- Kling Speak was a breakthrough—far more realistic than earlier tools I tried
- Only used lip sync on medium close-ups, not extreme closeups (where flaws are more visible)
- Chose shots with good lighting and front-facing angles for best results
- Accepted that some shots would remain static—better to have powerful stills than bad animation
- Intercut lip-synced performance with non-performance shots to manage viewer fatigue
Challenge 4: Pacing & Emotional Arc
Problem: With 38 shots across 3:30, how do you maintain visual interest without causing exhaustion? How do you build emotional momentum?
Solution:
- Mapped song structure to visual intensity: start intimate, build to chaos, release to clari
Built With
- capcut
- higgsfield
- kling
- midjourney
- producer.ai

Log in or sign up for Devpost to join the conversation.