Inspiration

Before this project, I mainly created functional or “companion-style” music — tracks designed for focus, study, and relaxation. Over time, I started experimenting with pairing AI-generated visuals and music to create short narrative music videos, and found the process deeply engaging. By the time of this competition, I had already produced several different AI-driven videos, and Monster became one of my latest challenge — a project that brought together everything I had learned so far, pushing me to take one more step beyond my previous limits.

Monster started from a simple idea — I just thought that transforming into a monster looked cool. I wanted to experiment with dynamic action shots, armor-style transformations, and cinematic sequences that feel powerful and stylish. What began as a visual experiment evolved into a full AI-driven music video exploring energy, rhythm, and transformation aesthetics.

What it does

This music video is part of the Dissonant Lab series — a space for exploring the boundaries of AI-assisted storytelling. Monster blends key elements of K-Pop and J-Pop, combining their rhythmic intensity and melodic sensibility into one hybrid structure. To shape the narrative flow, I composed two connected pieces — an atmospheric 40-second prelude serving as a memory-like introduction before the transformation, followed by the main track that drives the story as the protagonist awakens on the lab table, escapes, and battles robotic pursuers.

How It Was Built

  • Music & Vocals: Generated using Suno v5. This work consists of two interconnected tracks.

It opens with a minimalist and melancholic intro (approx. 40s), featuring a delicate music box melody paired with a gentle, intimate female spoken word delivered in a soft, whispering tone, as if recalling a distant memory. This creates a dreamlike and nostalgic atmosphere, setting the stage for the video's opening scenes of humanity and memory.

This then transitions into the main track: a high-energy dark-pop and KPOP-style electronic rock piece, driven by punchy electronic drums and synth bass. It employs atmospheric, restrained verses that erupt into dense choruses with Japanese vocals. This dramatic dynamic shift was used to represent the tonal change from human fragility to the monster's powerful transformation.

  • Visuals: Designed through Midjourney v7, Google Nano Banana and Higgsfield Suite, combining cinematic action framing with armor-style transformation aesthetics. Each sequence was generated based on pre-designed character references to ensure consistency while exploring dynamic motion and camera perspectives.

  • Additional Cinematic Shots: Supplemented with locally-generated clips using Wan 2.2 (local runtime). This was a key strategy to manage production costs and conserve online generation credits, which are rapidly consumed by cloud-based models.

  • Editing & Integration: Edited and synchronized in Wondershare Filmora 14, ensuring beat-accurate transitions and a seamless energy flow between the two halves of the track. Additional sound effects and visual effects were applied selectively on transitions and key shots to improve clarity, rhythm, and visual punch.

  • Production Time: ~40 hours from concept to final render.

Tools Used

  • Suno v5 – Music generation and vocal synthesis .
  • Midjourney v7 – Visual composition and design and some video generations.
  • Higgsfield Suite – Video and image synthesis, including:
    • Kling / Sora2 / Hailuo for motion generation and transformation sequences .
    • Google Nano Banana, Seedream 4.0, and Popcorn to produce alternate shots and character performances based on pre-designed character references, ensuring scene diversity while maintaining consistent appearance and emotion .
  • Wan 2.2 (local runtime) – Used for cost-effective local generation of supplemental shots, reducing reliance on credit-based cloud models.
  • Wondershare Filmora 14 – Final editing and post-production.

Challenges

Balancing realism and surrealism was the biggest challenge.
AI models can easily distort emotion or tone consistency across scenes.
I had to carefully refine prompts, lighting references, and motion guidance to keep character’s identity coherent through the transformation sequence.

Creating combat scenes was particularly difficult — most AI video models struggle with dynamic physical interaction and tend to avoid violent or graphic imagery. To work around these limitations, I designed monster-versus-robot sequences instead, allowing me to generate action moments that looked intense without triggering Sora2’s content restrictions.

Another major challenge was lip synchronization. I found it difficult to match the vocals and visuals precisely, so I relied on manual editing in Filmora to align near-matching mouth shapes and rhythmic phrasing, rather than using any dedicated lip-sync tools.

Accomplishments that I'm proud of

I’m very satisfied with the monster design, the transformation sequences, and the combat moments in Monster. I feel that the emotional buildup and rhythmic pacing of the video came together smoothly, creating a viewing experience that feels natural and dynamic.

Another accomplishment I’m proud of is the strong character consistency across all shots. This helped the entire MV feel close to a live-action production, which was one of my main goals — and one of the results I’m happiest with.

What I Learned

I learned that directing an AI-driven music video is remarkably similar to directing a live-action film.

As the director, I issue instructions (the prompts), but the final output relies on the "actors" — the AI models themselves. Different models and prompts will cause these "digital actors" to deliver entirely different performances.

My real job became learning how to guide these actors to deliver the exact performance and emotional effect I envisioned.

What's next for Monster

After finishing Monster, I don’t necessarily want to stay in the same cinematic, high-energy, realistic style. I’m starting to feel a pull toward something completely different — a world that’s cute, handcrafted, playful, yet still carries a slightly eerie mood.

I’ve been thinking about a project that combines spooky, Halloween-like imagery with the look of handmade objects: patchwork textiles, felt, stitched dolls, maybe even clay-like props. Instead of armor and lab tables, I want to try building a story in a smaller, toy-like world — something that feels like a dark fairy tale told through crafts.

It would be a fresh challenge for me, both visually and emotionally, and a way to explore another side of the same creative universe that Monster belongs to.

Built With

  • comfyui
  • google-banana
  • hailou
  • higgsfield
  • kling
  • midjourney
  • seedream4.0
  • sora2
  • wan2.2
Share this project:

Updates