Inspiration
“Midnight Vibez” was inspired by the idea of capturing a late-night emotional pulse—the feeling of walking through a sleeping city while your mind is awake, electric, and alive with rhythm.
Ellacoustic, a soft-spoken, intimate R&B artist, often writes from the perspective of quiet moments becoming loud on the inside. The music video expands that personal energy into a surreal, AI-augmented visual journey: fog that twists into soundwaves, streetlights that act like spotlights, mirror-worlds that pulse like beats, and alleyways that echo the movement of the song.
The entire project blends:
Hispanic female representation in AI music
City-as-emotion visual metaphors
Cinematic Pop-R&B storytelling
AI-driven generative artistry
The result is a video that feels like a dream you can walk through, choreographed to every beat of the track.
What It Does
The video transforms a real vocalist’s performance into a 21-scene AI-enhanced narrative, matching the song’s 2:44 runtime.
It uses:
Consistent character identity (same face, hair, wardrobe across all shots)
Multi-scene AI imaging + light-motion simulation
Beat-matched camera movement cues
Cinematic lighting pipelines (teal–amber–magenta aesthetic)
Non-morphing stable visual continuity
16:9 + 9:16 compatibility
Each 5–10 second segment is designed to look and feel like a real music video, but with visual elements that would be impossible in a physical shoot—floating lights, time-lagged mirrors, levitating puddles, gold-foil snowfall, star-map pavement, and clone-trail choreography.
The goal: give Ellacoustic a real visual identity in the AI-generated music world, creating a professional-quality piece that can be promoted, streamed, and submitted to awards competitions like Chroma.
How We Built It
The video was produced through a pipeline combining AI systems, 3D motion cues, and manual scene design, including:
- Prompt Engine Design
We constructed detailed, cinematic prompts for each of the 21 scenes:
camera vocabulary
lighting mood
color palette
wardrobe references
facial structure consistency
environmental physics (fog, reflections, snowfall, neon, mirrors)
- Consistent Character Control
We locked in:
one consistent face model
long curly dark hair
soft brown eyes
bomber jacket + cargo pants outfit
natural makeup
This ensured zero character drift across the entire 165-second timeline.
- Scene-by-Scene Visual Generation
For each shot block:
Still frames were generated using stable style prompts
Motion cues were baked in for use in image-to-video systems
Special effects (light trails, clone echoes, soundwave vapor) were layered separately, not morphing the subject
- Motion Synthesis
Smooth, cinematic camera movement was created using:
slow dolly-ins
lateral tracking
orbit shots
crane lifts
controlled pans
All movements were beat-matched to Ellacoustic’s Pop-R&B groove.
- Color, Grade & Continuity
The video uses a unified palette:
moonlit teals
warm sodium-amber pools
neon magenta accents
This created the consistent mood that keeps the surreal elements cohesive and believable.
Challenges We Ran Into
- Maintaining Character Consistency
AI image generators tend to change faces scene-to-scene. Keeping Ellacoustic’s identity stable required:
reference locking
seed reuse
wardrobe constraints
repeated corrective inpainting
- Avoiding Visual Morphing
Many AI video systems morph faces or distort movement. We solved this by:
using minimal subject motion within frame
offloading movement to camera motion
isolating VFX from the body
- Rhythm-Accurate Timing
A Pop-R&B + trap-infused track requires tight rhythmic alignment. Matching visual pulses to snare hits and sub-drops took iterative timing passes.
- Complex Multi-Lighting Scenes
Scenes like levitating puddles or rotating sky domes required balancing:
diffraction
shimmer
bloom
stable rendering
without losing realism.
- AI Motion Limits
Some scenes—like clone trails, organ-pipe light circles, and constellation pavement—needed handcrafted effects layered onto AI-generated shots to preserve clarity.
Accomplishments That We’re Proud Of
Created a fully AI-directed 21-scene music video with professional-level cohesion.
Achieved zero face drift across a wide range of environments.
Executed cinematic, beat-synced camera moves without stutter or morph artifacts.
Produced a competitive-quality music video for an artist who exists entirely within the Entertune World creative ecosystem.
Designed a new visual identity for Ellacoustic that can extend into:
album covers
stage visuals
VR events
merch branding
future music videos
Delivered a project that looks like a hybrid of:
R&B cinematic storytelling
futuristic neon realism
dreamlike urban fantasy
What We Learned
AI filmmaking is most stable when you treat the model like a camera operator—not a morphing animator.
Strict visual rules dramatically improve quality:
same face
same outfit
consistent lighting
controlled movement
AI scenes work best when motion is environmental or camera-based, not body-based.
Developing a visual identity for an AI musician gives them the same promotional strength real artists have.
When blending R&B, trap, and dreamlike city visuals, color control is everything.
What’s Next for Ellacoustic – Midnight Vibez
- Full Social Media Rollout
Vertical 9:16 TikTok/Reels versions
YouTube Shorts teaser pack
Motion posters for promotion
- VR Integration in Entertune Worlds
Premiere event at Entertune Stage
Rooftop “Midnight Vibez Night” at Entertune City
Collectible vinyl + emotes themed after the video
Player quests inspired by scenes (e.g., “Mirror Pulse Alley Quest”)
- Extended Universe Content
Acoustic remix visualizer
“Behind the Scenes in AI” breakdown video
“Ellacoustic: The Making of Midnight Vibez” promo documentary
- Next Music Video
The next Ellacoustic visual project will further expand her narrative universe:
more surreal realism
new emotional environments
stronger choreography and dynamic lighting
Built With
- capcut
- minimax
- suno
- sync.so

Log in or sign up for Devpost to join the conversation.