I created this video around a core concept: Japan Aesthetics. Once I locked that theme in, I began collecting visual ideas like sumo, katana, ramen, traditional temples, neon streets, and the overall contrast between old and modern Japan. I thought the hardest part would be creating beautiful shots, but the real challenge was the AI generation process itself.
I ran into multiple problems while picking the right model for image and video generation. I tested Wan 2.5, VEO 3, and a few others, but none of them gave me smooth and consistent slow motion. The characters often warped and backgrounds fell apart when I tried adding handheld camera movement, which I really wanted for a more organic cinematic feel.
After a lot of trial and error, I decided to build the entire video using Kling 2.5 Turbo Pro since it handled motion and visual coherence better. Even then, I made a very specific technical choice. I generated the clips in 720p, even though the final timeline was 1080p. The reason for that was simple. I wanted extra space around the footage so I could add motion, scaling, and speed ramps later inside After Effects without losing detail.
The project became bigger than I expected. Something that was supposed to be a short visual experiment turned into five full days of work, and out of that, two days were dedicated entirely to sound design. I added layers like Tokyo street ambience, temple bells, ramen stalls, sumo impact sounds, and metallic textures during katana sequences. The audio needed to feel detailed and alive for the visuals to land emotionally.
In the end, this project taught me a lot about the limits and strengths of current AI video tools. It was not just about generating Japanese themed visuals. It was about working through model limitations, motion issues, resolution planning, and finally shaping everything into a cinematic experience through editing and sound.
Built With
- kling2.5turbopro
- nanobanana
- seedream4
- veo3.1
Log in or sign up for Devpost to join the conversation.