Inspiration

The initial spark for this project came from the timing of the Chroma Awards itself—taking place at year's end. We found ourselves reflecting on the family-centered films that captivated audiences at the turn of the millennium, films like The Family Man.

As AI becomes increasingly integrated into our daily lives, we wanted to explore what remains uniquely human: emotions—regret, mistakes, family, love. We asked ourselves: what better way to showcase the irreplaceable nature of human feeling than through AI itself?

This project became our answer: using AI as a medium to tell a story about what AI can never truly possess—the profound, messy, beautiful emotional experiences that define our humanity.

What it does

Mystery Invitation is a 5-minute silent musical drama that follows an elderly couple through an extraordinary journey across time.

The film opens with a visual metaphor of stagnation: the couple sits before their television, trapped in the monotonous loop of daily routine—time itself seems frozen. Then, a doorbell rings. An intriguing invitation arrives: "Would you like to go back to the past?"

We deliberately avoided forcing logical explanation. The impact of "a strange invitation disrupting ordinary life" was what mattered most.

The Journey Begins: Arriving at a mysterious hotel, they receive a key marked "1925" (representing the present) and step into the elevator. Suddenly, it shakes violently. Blackout. When the doors reopen, they've been transformed—now young again.

1920s - The Jazz Age: They find themselves as an Irish immigrant man and a Black jazz singer—not their actual past, but an idealized one. Echoing films like Midnight in Paris, we emphasized how we always romanticize the past by sending them even further back. They explore the newly built Empire State Building and roam the streets of 1920s New York, reveling in their youth.

1940s - Victory and Romance: Adjusting the car radio triggers another time warp to the 1940s. Swing jazz fills the air as they dance. Then comes the news: World War II has ended. In Times Square, they recreate the iconic "The Kiss" photograph. The camera shutter clicks—snap—and time shifts again.

1960s - Peace and Revolution: The psychedelic sounds of hippie rock surround them as they experience the anti-war, peace-loving vibe of the 1960s.

The Return: Before they know it, they're back at the hotel, dining together. A note arrives: "It's time to return." They stand before the elevator once more. The woman steps in without hesitation, but the man hesitates. Though our film is silent, his body language speaks volumes: "I don't want to go back. I want to stay in this bright, shining moment."

But the woman gazes at him steadily, as if to say: "No. We have something real."

Together, they enter the elevator and return to the present.

The Truth Waiting at Home: They arrive home, seemingly disappointed. But as they open the door, their daughter, son-in-law, and grandchildren greet them with a Christmas cake, celebrating the holiday together. The couple exchanges a knowing glance—any trace of regret vanishes. The whole family embraces, and the film ends.

The Core Message: The most radiant time is always "today." Nothing can replace the family and the bond built through time together.

How we built it

Our production workflow was designed to maximize efficiency and quality by strategically combining multiple off-the-shelf AI models based on their empirical strengths rather than marketing claims.

1. Original Image Acquisition (Text-to-Image & Image-to-Image):

  • Character Design (Young): We used Midjourney V7 to design the protagonists' younger appearances using only text prompts, with no reference images.
  • Character Variation (Elderly): ByteDance Seedream was used to create variations showing the characters in their older years.
  • Costume Design: Gemini Imagen 3 (Nano Banana) generated costume variations for different time periods.
  • Character Sheets: We created turnaround character sheets using Gemini Imagen 3 (Nano Banana) to maintain consistency.

2. Historical Visual Authenticity: To ensure period-accurate styling, we used Gemini 2.5 Pro Deep Research to investigate distinctive keywords for each era's fashion and aesthetics. These research findings were then refined into text prompts for Midjourney, ensuring historical authenticity in our visual design.

3. Storyboard Creation (Image-to-Image):

  • Background images were generated using Midjourney V7 and ByteDance Seedream for various compositions.
  • Nano Banana was used to composite characters into backgrounds with precise composition control.
  • We arranged images sequentially according to our pre-planned storyline, creating both start frames for image-to-video generation and start-end frame pairs for transitions.

4. Individual Clip Generation (Image-to-Video): Based on our storyboard, we generated all clips using generative models. We employed multiple I2V models strategically, selecting each based on empirical testing rather than demo videos:

  • Natural Movement (Dance, etc.): Kling 2.5 Turbo for fluid, realistic human motion
  • Static, Fixed Composition: ByteDance Seedance 1.0 Pro for stable, controlled framing
  • Wide Shots with Physical Movement (Driving, etc.): Google Veo 3.1 for distant perspective and physics
  • Detailed Direction (Transitions, POV Changes): MiniMax Hailuo Pro for precise, nuanced cinematography

5. Cut Editing: We assembled the timeline and performed cut editing using Adobe Premiere Pro.

6. BGM, SFX, and Voice Production: After finalizing the timeline through cut editing, we produced all audio elements:

  • Sound Effects: All SFX were generated using ElevenLabs Sound Effects (Text-to-Sound) from text prompts only.
  • Voice: The brief radio announcement of WWII's end was created using ElevenLabs Voice Design, crafting a noise-textured radio host voice for period authenticity.
  • Music: All BGM was generated from text prompts without reference inputs, using Suno V5 and ElevenLabs Music in combination:
    • Extended sequences (Opening): Suno V5 for natural mashups and long-form composition using the extend feature
    • Precise timing sections (1960s): ElevenLabs Music with section-based regeneration for exact duration control

The natural flow of sound design—from Christmas ambience to 1920s jazz, 1940s swing jazz, 1960s hippie rock, and back to Christmas carols—creates an immersive temporal journey through audio alone.

7. Final Assembly: All visual and audio elements were combined and finalized in Adobe Premiere Pro.

Why Silent Film? Ironically, given that ElevenLabs—a leader in voice synthesis—is a main sponsor of this competition, we chose to create a silent film. We wanted to demonstrate that visuals alone could create an experience that feels "heard," while the absence of dialogue creates a hunger for voice that paradoxically emphasizes its power.

Team & Timeline: This entire project was completed in 5 working days with the following team:

  • Feel Hwang: Storyboard creation, final comprehensive editing, intro/outro clip production
  • Bonnie Lee: 1960s sequence production
  • Sunny Lee: 1920s sequence production
  • Suna Kim: 1940s sequence production
  • Jay Ko: Original character design, BGM, SFX production

Challenges we ran into

Our greatest challenge was overcoming the data distribution limitations inherent in most I2I and I2V models.

Composition Constraints: Most models are trained on relatively static compositions with subjects centered in frame. However, our long-form narrative required characters or elements positioned at frame edges, or isolated subjects in specific locations within wide shots. Achieving this level of compositional precision through text prompts alone proved extremely difficult.

Directional Control: Text-based models struggle with directional instructions relative to subjects. To overcome this, we developed a breakthrough approach: providing compass coordinates (north, south, east, west) alongside prompts, which significantly improved spatial accuracy.

Compounded Consistency Challenges: We encountered an unforeseen technical difficulty: ideally, one person should have created all storyboards using a single model to maintain consistency. However, due to our tight 5-day timeline, each team member worked on their assigned sequence from the storyboard stage. This distributed workflow meant consistency challenges were compounded (I2I × I2V), exponentially increasing the difficulty of maintaining visual coherence across the entire film.

Accomplishments that we're proud of

Empirically-Optimized Pipeline: We're most proud of our empirically-tested, strategically optimized multi-model pipeline. As AI hype proliferates, people often declare what a model "should" do based on demos alone—but reality frequently differs. Through hands-on trial and error, we identified each model's true characteristics and biases, then orchestrated them according to their actual strengths rather than marketing promises.

Human Stories Through AI: Thematically, we take pride in our counter-approach. AI video discourse typically prioritizes "cutting-edge" spectacle and eye-catching sequences. Instead, we used natural cinematic language to explore the most human subject possible: family. This juxtaposition itself feels like a meaningful provocation.

Sound as Narrative: Finally, given that this is a competition hosted by ElevenLabs, which values the emotional power of sound, we're deeply satisfied with our sonic storytelling. Our audio design—flowing naturally from Christmas ambience through 1920s jazz, 1940s swing jazz, and 1960s hippie rock before returning to Christmas carols—uses sound to convey time periods as effectively as visuals. This project also gave us an invaluable opportunity to thoroughly test ElevenLabs Music's capabilities.

What we learned

Closing the Demo-Reality Gap: Despite countless feeds designed to trigger FOMO, AI is not a silver bullet. It's a tool that can more economically manifest human creativity. The most crucial skill is the ability to test tools hands-on and select the right one for your specific situation. When wielding powerful tools like AI, empirical trial-and-error becomes essential for establishing a robust baseline or scope through systematic exploration.

AI as True Creative Partner: Having previously focused on AI-generated viral shorts, this was our first experience treating AI as a genuine creative partner. We discovered that when clear human intention exists about what to express, combined with iterative learning about how to express it, AI becomes fertile soil from which countless creative flowers can bloom. This project gave us profound confidence in AI's potential as a collaborative medium rather than just a production shortcut.

What's next for Mystery Invitation

Expanding Cultural Representation: We plan to refine our pipeline and, if the opportunity arises, expand this project to feature diverse racial and ethnic representations. Through this production, we discovered that different AI models have varying coverage and accuracy across different ethnicities. We're committed to exploring Asian, Hispanic, and other underrepresented communities, expressing the borderless beauty of humanity in all its forms.

From Silent to Spoken: While silence was our creative choice for this iteration, if this project receives positive reception, we're eager to explore the emotional power of dialogue. We want to discover how spoken words—enabled by voice synthesis technology—can deepen the emotional resonance of these universal human stories.

Systematizing Our Workflow: We're excited to formalize our empirically-tested multi-model pipeline into a more systematic framework that other independent creators can adopt and adapt for their own projects, democratizing access to high-quality AI filmmaking.

Built With

  • adobe-premiere-pro
  • bytedance-seedance-1.0-pro
  • bytedance-seedream
  • elevenlabs-music
  • elevenlabs-sound-effects
  • elevenlabs-voice-design
  • gemini-2.5-pro-deep-research
  • gemini-imagen-3-(nano-banana)
  • google-veo-3.1
  • kling-2.5-turbo
  • midjourney-v7
  • minimax-hailuo-pro
  • sunov5
Share this project:

Updates