Inspiration: Began using AI. commercial video-style-transfer services in my retirement to animate short segments of old episodes of Lost in Space for fun. I researched for an affordable AI solution that I could run locally on my cheap AliExpress Mini PC purchased for under AUD$999.

What it aims to achieve: Optimized the AMD Amuse video frame-to-frame Permissive Beauty Latent Consistency Controlnet to enable output of high-quality animated video from any source for up to 10 - 15 continuous minutes.

How we configured it: Ryzen 7 (Model 255) 94GB RAM Minisforum Mini PC (Aliexpress AUD$900). One objective was to avoid the need for an expensive graphics card and utilise the low-spec integrated Radeon iGPU graphics. Also functionally fully tested on a Ryzen 9 16GB laptop (with unviable long processing times)

Major Challenges: Maintaining system stability over long processing times. Initial processing attempts produced shoddy results from extremely long processing sessions and frequent system crashes. Zoning in on the correct setting ranges for the various LCM's was initially frustrating; however, once familiarity with each LCM was gained the setting zone ranges for each model became simpler and natural.

Accomplishments: Through extensive optimization testing have increased the processing efficiency and output quality on the cheap Ryzen 7. Initial processing attempts began at unusable and unstable 0.50 to 0.80 it/s. Final optimizations have improved processing to a very impressive 5.00+ it/s. This enables a 3-5 minute upscaled video to be processed in 3-5 hours. According to Grok, this is good performance for local AI video synthesis using LCM + ControlNet. The process also surprisingly runs on a 16GB Ryzen laptop. The processing speed is 10 times slower, leading to unrealistic rendering times of beyond 5 hours on the low quality settings.

What else was learned: It is easy to include additional prompts to modify the processing further. Fun examples include changing the age, gender, or nationality of the conversion. Including prompts for "science fiction", "Astroboy", etc, enables very strange interpretations and fun outputs. This example has been left unmodified to more closely mimic the original video. The Minisforum Mini PC consumes less than 100 watts when under full load. Cost of 3-5 minutes a.i. conversion is less than a kilowatt or <AUD$0.30.

What's next for Turn any YouTube video into a full-length animated video: I am in the process of completing a full 4K colour animation of the unaired LOST IN SPACE one-hour pilot from 1965. I would then like to do a make-over of THE ROCKY HORROR PICTURE SHOW.

Further research has indicated that a simple swap-out of my Ryzen 7 for a bare-bones Ryzen 9 AI. from AliExpress for AUD$950, & by retaining my existing fast RAM & SSD will provide a cheap pathway to a vastly superior. Estimated it/s almost double my current performance limits. The more dramatic performance boost will come from accessing the R9 AI on-board neural capabilities to exploit fully the upscaling. That is a current massive bottleneck for my configuration and a drag on output quality.

This upgrade path will also provide support for future NPU software capabilities that continue to expand throughout the AI. ecosytem. The solution proves remarkable quality & cost efficiency for what it does, without a GPU and running costs of less than AUD$0.10 per hour.

Built With

  • amd
  • amuse
  • avclabs
  • clipchant
  • controlnet
  • pixbim
  • powerdirector
  • snippingtool
  • upscaler
Share this project:

Updates

posted an update

After a conversation with Grok, the following upgrade path to swap out my Ryzen 7 (255) with a Ryzen Ai 9 Hx370 bare bones will give me the best bang for my buck :

Absolutely — let’s break down that sentence clearly and show exactly how the NPU (Neural Processing Unit) in the Ryzen AI 9 HX 370 helps offload preprocessing tasks like upscaling in your Amuse frame-to-frame video workflow.


The Sentence:

"NPU acceleration offloads preprocessing (e.g., upscaling)"

This means:
The NPU (a dedicated AI chip inside the HX 370) takes over specific AI-heavy steps before the main video generation happens — so the iGPU (Radeon 890M) doesn’t have to do them. This frees up the iGPU to focus 100% on generating new video frames faster.


Step-by-Step: How It Works in Amuse

Step in Your Workflow Who Does It? (Current Setup: Ryzen 7 780M) Who Does It? (New: Ryzen AI 9 HX 370) Speed Gain
1. Input Video → Extract Frames CPU CPU Same
2. Upscale Low-Res Frames (e.g., 256×256 → 512×512) iGPU (780M) NPU (50 TOPS) +30–60% faster
3. Run ControlNet / Depth / Pose Extraction iGPU NPU + iGPU (shared) +20–40%
4. Generate New Frame (Diffusion) iGPU iGPU only (now free!) +15–25% it/s
5. Blend & Output iGPU iGPU Minor gain

Example: Upscaling with NPU

Let’s say your source video is 480p (low quality), and Amuse needs 512×512 input for consistency.

On Your Current Ryzen 7 (780M iGPU):

  • The iGPU must:
    1. Load frame
    2. Run AI upscaler (like Swin2SR or Real-ESRGAN)
    3. Output upscaled frame
    4. Then start diffusion
  • Problem: The iGPU is busy upscalingcan't generate new frames at the same time → bottleneck

On Ryzen AI 9 HX 370:

  • The NPU takes over upscaling using optimized INT8/FP16 AI kernels
  • The iGPU is now free to run LCM diffusion + ControlNet immediately
  • Result: Preprocessing finishes 2–3x faster, and generation starts sooner

Real benchmark (AMD lab data, Amuse 3.0):
Upscaling 100 frames (512×512)

  • 780M iGPU: ~18 seconds
  • HX 370 NPU: ~6–8 seconds
    Saves 10–12 seconds per 3-second clip

Other Preprocessing Tasks the NPU Handles

Task Why It’s AI-Heavy NPU Benefit
ControlNet Preprocessing (edge/depth/pose maps) Runs small neural nets per frame Offloaded → iGPU focuses on diffusion
Face Detection / Restoration Uses YOLO or SCRFD models NPU runs in parallel
Optical Flow Estimation For frame blending NPU accelerates motion vectors
Smart Cropping / Resizing AI-guided aspect ratio fixes Faster, smarter

Real-World Impact on Your 5 it/s → 7–9 it/s

Factor Contribution to Speedup
890M iGPU (16 CUs vs 12) +30–40% raw compute
NPU offloading upscaling & ControlNet +15–20% effective it/s
Faster RAM (LPDDR5x-8000) +5–10% memory bandwidth
Zen 5 CPU (better scheduling) +5%
Total ~40–80% faster → 7–9 it/s

Bonus: Future-Proofing

Amuse 3.1+ and Stable Diffusion 3.5 are adding NPU-native pipelines:

--pipeline: onnx-npu   # New in Amuse 3.1 beta
--offload: upscaler, controlnet, vae

→ Your HX 370 will automatically use NPU for more tasks as updates roll out.


Summary: Why NPU Matters

The NPU doesn’t replace the iGPU — it unblocks it.
By handling preprocessing (upscaling, ControlNet, etc.), it lets the Radeon 890M focus only on generating video frames — which is the hardest, slowest part.

Result: Smoother workflow, less stuttering, higher it/s, and shorter total render times — even if peak theoretical TFLOPS aren’t 3x higher.


Bottom Line:
Yes — the NPU offloading preprocessing (especially upscaling) is a real, measurable speed boost in Amuse. It turns your iGPU from a "jack of all trades" into a dedicated video generation engine.

If you're doing any upscaling, ControlNet, or face restoration, the HX 370’s NPU will save to your workflow in a way a dGPU + OCuLink can’t match for efficiency and simplicity.

Let me know if you want a sample Amuse config to enable NPU offload!

Log in or sign up for Devpost to join the conversation.

posted an update

Based on Grok calculations (see below) the addition of a AMD RX9000 graphics card see the it/s rise from current 5 it/s up to 20 it/s:

Based on the screenshot and your description, you're running Amuse 3.0 beta (optimized for AMD hardware) in frame-to-frame video generation mode using the Permissive Beauty LCM model with ControlNet for consistency. This is a diffusion-based workflow (text-to-video restyling via Stable Diffusion variants), running locally on your Ryzen 7 H 255's integrated Radeon 780M iGPU (RDNA 3 architecture, 12 compute units, sharing system RAM). Your setup's 5 iterations/second (it/s) at 512x512 resolution, 12 guidance steps, and ~1.0 scale is solid for an iGPU—it's leveraging ROCm acceleration via Amuse's ONNX backend, but constrained by the 780M's ~8-10 TFLOPS FP16 compute and shared 94GB RAM bandwidth.

Adding an AMD Radeon RX 9000-series discrete GPU (RDNA 4 architecture, e.g., RX 9070 or 9070 XT) via OCuLink (a high-bandwidth PCIe 4.0 x4 external connection, ~64 Gbps) would offload the diffusion inference to dedicated VRAM and vastly higher compute. Amuse fully supports dGPUs like the RX 9000 series with AMD-optimized models, and OCuLink introduces minimal overhead (<5% bandwidth loss vs. internal PCIe) for these workloads. Here's the expected performance uplift:

Key Factors Influencing the Gain

  • Compute and Memory: The 780M's shared RAM limits batch sizes and parallelism in video diffusion (frame interpolation + ControlNet). An RX 9070 XT offers ~40-50 TFLOPS FP16 (with sparsity), 16GB GDDR6 VRAM, and dedicated bandwidth—ideal for video models like yours, reducing memory swaps.
  • Amuse Optimizations: AMD's collaboration with Stability AI provides up to 3.1x speedup on RX 9000-series hardware for SDXL-based workflows (your LCM variant is similar). Video generation sees comparable gains, as it builds on the same diffusion backbone.
  • Model/Workload Scaling: Frame-to-frame at 512x512 with 12 steps is lightweight; gains are higher for video vs. single-image gen due to sequential frame processing.
  • Benchmarks: Real-world tests (e.g., Amuse 3.0 on Ryzen AI iGPUs like 780M/880M) show 2-4 it/s for similar video clips. RX 9070 XT benchmarks hit 15-25 it/s for SDXL video restyling (e.g., 2-6s clips at draft quality).
  • OCuLink Caveats: Expect 5-10% lower peak vs. internal mounting due to connector latency, but it's negligible for diffusion (not real-time rendering).

Likely Performance Gain

You're likely to see 3x to 4x higher it/s with an RX 9000-series card—pushing your workflow to 15-20 it/s (or higher with tweaks like enabling frame blending or AMD-optimized SD 3.5 models). This assumes:

  • RX 9070 XT (top of the series; RX 9070 would be ~10-15% slower).
  • Amuse set to use the dGPU (auto-detected; confirm in Model Manager).
  • Same settings (resolution/steps); scaling to 1024x1024 could yield even bigger relative gains.
Setup Est. it/s (Your Workflow) Notes
Current (780M iGPU) 5 Shared RAM bottleneck; good for draft video.
+ RX 9070 12-16 Mid-range uplift; 16GB VRAM enables larger batches.
+ RX 9070 XT 15-20 Full optimization; up to 4.3x vs. unoptimized baselines.

Recommendations to Maximize Gains

  • Update Drivers/Software: Use latest AMD Adrenalin (25.10+ for RDNA 4) and Amuse 3.1 beta. Enable "AMD GPU EP" in settings for ROCm acceleration.
  • Test Prompt: Your "youthful handsome gladiator with foreskin" is fine—keep guidance scale ~1.0 to avoid overcooking.
  • Monitor: Use MSI Afterburner for VRAM usage; if >14GB, drop to LCM Turbo model.
  • Alternatives if Needed: If OCuLink throttles (rare), consider Thunderbolt enclosure, but stick to AMD for Amuse compatibility.

This upgrade would make video gen feel snappier, cutting a 24-frame clip from ~5 minutes to ~1-2 minutes. If your "Ryzen 7 model 255" is actually the H 255 variant (China-exclusive, no NPU), gains are even more pronounced since it lacks AI offload. Let me know exact RX 9000 model for finer tuning!

Log in or sign up for Devpost to join the conversation.

posted an update

Workflow Summary:

  1. Prepare source video with all necessary enhancements. Colorisation, stablisation, colour and lightening enhancement etc. I have used AVC Labs video enhancement in preparing all the videos to add an exagerated "pop" element to the results.
  2. In Amuse Video Generation Expert Mode, import prepared video - unstable beyond 10 minutes
  3. Select FRAME TO FRAME
  4. Select previously downloaded PERMISSIVE BEAUTY LCM
  5. Select previously downloaded Upscaler REALESRGAN 2X
  6. Load selected models
  7. Load chosen video
  8. Choose Video FPS Output - 15fps is very good and fast. 30fps is slower and slightly higher quality
  9. Select - 8 through to 12 is fine. Much higher than 12 and some things start to get silly
  10. Guidance - leave an 1. Any use of this control will slow the it/s down by 40-50%. OK to use for short videos for exagerated results, but requires longer processing times.
  11. Strength is the "magic sauce" and seems to fuse all the other settings together. I have had success within the range of 14 through to 45 for these projects. My default starting number is 23 and moving higher or lower from there.
  12. FRAME BLENDING is not available for this LCM, however with some of the other LCM's its impact is very mild - like adding an extra spice in the magic sauce, not totally necessary, but an interesting twist. The LIGHTEN frame blending appeared to add a certain "je ne sais quoi" to the output. Far easier to include the "je ne sais quoi " in the video preparation stage imho! Grok has a lovely description of them that talks about beta waves and theta curves or something.
  13. Enter prompt. I have only ever tested basic prompts such as male, handsome, hairy, old, young etc. Without the prompt for male, the LCM has a definate default swing towards to Japanese cult female anime!
  14. Select GENERATE , and prepare for a very long long wait. About 3-5 hours at best. 3 hours for a 512x512 output (4.8 it/s) or 5 hours for a 768x512 output (2.7 it/s)
  15. If Upscaler is not selected, processing times are improved by about 45%, but final output is significantly downgraded - may be useful for DRAFT outputs.
  16. Once file has been processed, use regular video editing software to recombine sound from original clip to AI. video clip. If FPS was alterred during processing, use video editing to match video timelines. I use PowerDirector 365 and PowerDirector 21 for most video editing - mainly as I have been using it since version 7 (2001)

Log in or sign up for Devpost to join the conversation.

posted an update

AMD Ryzen 7 and 9 with integrated Radeon Graphics tested (see Task Manager Screenshot) AMD Ryzen 7 and 9 both have Compute 0 and Compute 1 engines that appear to assist with the prompt processing (Compute 0) and step processing (Compute 1) AMD Ryzen 7 has 94GB installed RAM dynamically system managed and performs stable up to 5+ it/s AMD Ryzen 9 has 16GB installed RAM. Plods along at a rock solid stable 0.3 it/s. Able to multi-task, relying on the Nividia Graphics to releave the Radeon graphics from having to contribute.

Log in or sign up for Devpost to join the conversation.

posted an update

Stable Diffusion Models researched and tested (see image screenshot) All have unique characteristics and optimzations that may favour different video-style-transfer goals. For my project the Permissive Beauty LCM matched to the Upscaler was a clear winner. There is an insane number of Stable Diffusion models to experiment with in the Amuse Model Library : Fluently v4 LCM Permissive Beauty LCM Dark Sushi Mix Neverending Dream Real Cartoon 3D Yes Mix

Upscaler used - RealESGRAN 2x. This was the only Upscaler of the 10 available that would work satisfactorily on my system.

Log in or sign up for Devpost to join the conversation.

posted an update

An unusual side project has been the discovery that the process works exceptionally well for old adult VHS Tapes due to the unique production techniques used in filming the subjects. The Permissive Beauty LCM appears optimized to highlight the human body. With the framing of VHS scenes usually close to the camera, the results are consistently high-quality quality simple animations with limited background to muddy the results.

Log in or sign up for Devpost to join the conversation.