Inspiration
This film began with a simple observation about relationships: how often what we say is not what we truly mean.
What it does
I wanted to capture the humor and chaos that can arise from those everyday miscommunications. By weaving in surreal elements such as rewinding time and alternate possibilities — parallel universes of what-ifs — the film reflects how we all sometimes wish we could replay our choices, and shows how even the smallest decisions can ripple in unexpected ways. Like any true anime, there's a post-credits scene waiting for you so watch until the very end for that final wink to the audience!
How it is built
After developing the concept through brainstorming, I created a storyboard to map out the emotional flow of the film. Using this as a guide, I generated the visual assets with AI image models. These images were then fed into AI video tools to animate the scenes. The resulting clips were refined and stitched together in CapCut, where I also incorporated music and foley effects from its library. All dialogue and subtitles were produced using ElevenLabs, completing the film’s audio narrative layer.
Challenges
Creating anime with AI is still unpredictable, so several challenges emerged: Maintaining character consistency across scenes required repeated regeneration and fine-tuning. Ensuring smooth motion demanded testing multiple models, as some produced distorted frames. Balancing creative control vs. AI autonomy was a constant dance — some outputs were beautiful accidents, others needed heavy intervention. Time-consuming refinement was needed to keep emotional beats clear while staying true to the story’s tone.
Accomplishments
Successfully crafted a fully original anime short with a distinctive emotional atmosphere, this film achieved cinematic anime-style visuals without a full animation studio pipeline. The project showcases that AI can empower independent creators to tell ambitious stories in formats previously inaccessible.
What is learned
AI is a powerful creative partner, but it requires clear direction, patience, and a willingness to iterate. Emotional storytelling still relies heavily on human intuition — AI can create frames, but people create meaning. A hybrid workflow (AI + human craftsmanship) delivers far better results than relying on automation alone. Pre-planning — storyboards, tone references, visual bibles — is essential to keep AI outputs aligned with the vision.
What's next for Hear Beyond Words
I would like to expand the project into a feature film or episodic micro-series.
Built With
- capcut
- elevenlabs
- google-cloud
- hailuoai
- hedra
- klingai
- ltxstudio
- lumaai
- nanobanana
- veo
Log in or sign up for Devpost to join the conversation.