feat: LTX-2 support by pwilkin · Pull Request #1458 · leejet/stable-diffusion.cpp

pwilkin · 2026-04-23T23:48:24Z

Please have mercy, had to murder my Claude Code to get this working.

SD_CUDA_DEVICE=1 SD_CUDA_DEVICE_CLIP=-1 SD_CUDA_DEVICE_VAE=0 timeout 1800 ./bin/sd-cli -M vid_gen \
    --diffusion-model /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev-Q5_K_S.gguf \
    --llm /media/ilintar/D_SSD/models/ltx-2/gemma-3-12b-it-qat-IQ4_XS.gguf \
    --vae /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_video_vae.safetensors \
    -m /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_embeddings_connectors.safetensors \
    --gemma-tokenizer /home/ilintar/.cache/huggingface/hub/models--google--gemma-3-12b-it/snapshots/96b6f1eccf38110c56df3a15bffe176da04bfd80/tokenizer.json \
    -W 640 -H 480 --video-frames 25 --steps 60 --fps 24 --cfg-scale 6.0 --seed 42 \
    -p "a cat walking on a sandy beach at sunset, cinematic, 4k" \
    -o /tmp/ltx2_smoke.webm

ltx2_smoke_v2.webm

Green-Sky · 2026-04-24T09:39:55Z

I think there is some good stuff we can pull out of here (:

btw, gemma-3-12b-it-qat-IQ4_XS.gguf why iq4 of qat?

pwilkin · 2026-04-24T10:41:45Z

@Green-Sky that's a very good question, probably "because I wasn't thinking about it" is the proper answer ;)

JohnLoveJoy · 2026-04-24T13:40:58Z

Great work. How does this perform compared to ComfyUI?

pwilkin · 2026-04-24T13:51:46Z

Haven't compared yet but gonna optimize further.

mudler · 2026-04-24T14:51:47Z

wow! was actually playing with it myself as well with Claude letting it go by itself. Will open up a PR just for reference, got this working with claude as well yesterday

this is the result I got with it

pwilkin · 2026-04-24T14:52:39Z

Slightly funky still, so guess there's a subtle error somewhere, but I added fitting, so I managed to get 80 frames at 720p ("a black cat jumping at a brown mouse on green grass"):

ltx2_cat_mouse_720p.webm

pwilkin · 2026-04-24T14:57:15Z

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

pwilkin · 2026-04-24T15:07:00Z

Probably FA is the culprit here - I'm running this on 26 GB VRAM total (3080 10 GB + 5060 16 GB), so really struggling to get anything reasonable :)

wbruna · 2026-04-24T15:24:55Z

+    //   SD_CUDA_DEVICE_VAE      VAE                          (falls back to SD_CUDA_DEVICE)
+    //   SD_CUDA_DEVICE_CONTROL  ControlNet                    (falls back to SD_CUDA_DEVICE)
+    //   SD_VK_DEVICE            same pattern for the Vulkan build
+    // Setting any of these to -1 forces CPU for that component.


Just as a reminder: this should be coordinated with #1184 .

Yeah, this is just a rough PoC for now.

mudler · 2026-04-24T15:30:39Z

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

I'm using the distilled model:

~/ltxv-sd-cpp/build-cuda/bin/sd-cli -M vid_gen \                              
    -m ltxv-models/ltx-2.3-22b-distilled.safetensors \                                                                                                                                                    
    --text-encoder gemma-3-12b-it \                                                                                                                                                                       
    -p 'a cat walking across a grassy field' \                     
    -W 768 -H 512 --video-frames 121 \                                                                                                                                                                    
    --steps 8 --cfg-scale 1 \                                                   
    -o /tmp/ltx23_clean.webp --seed 42

pwilkin · 2026-04-24T15:42:42Z

@mudler yeah I'm doing full for some reason (probably the same one that caused me to pick IQ4_XS :D)

pwilkin · 2026-04-24T20:21:58Z

So apparently there are some major divergences between CPU and CUDA Gemma3, which is a bit surprising (and it happens on both Q4_0 and the IQ4_XS quants).

leejet · 2026-04-27T14:39:53Z

I’m also working on adding support for ltx2.3. So far, I’ve confirmed that both Gemma and VAE are working fine. However, there are still some issues with the diffusion model, and I’m currently trying to locate the problem.
#1463

pwilkin · 2026-04-27T16:08:23Z

@leejet funny because I seem to have had the opposite problem :) DiT seems to be working fine here, but no matter what I do I can't achieve parity on Gemma with CUDA vs CPU.

Gonna try on higher quants if I can somehow fit them on my potato 😄

leejet · 2026-04-28T17:53:53Z

Currently, I’ve fixed the DiT issues in my branch, and Gemma3 works well on both CPU and CUDA. Feel free to try my implementation.

pwilkin · 2026-04-28T18:03:05Z

Jeez, the issue was I was using the +1 norm when the GGUF conversion already bakes the +1 norm shift. Stupid error.

pwilkin · 2026-04-28T18:03:41Z

@leejet my branch has all the offloading fixes that allow me to run the Q6_K model on my 16 GB + 10 GB VRAM setup :)

leejet · 2026-04-28T18:06:21Z

Jeez, the issue was I was using the +1 norm when the GGUF conversion already bakes the +1 norm shift. Stupid error.

This issue also took me quite some time to fix.

leejet · 2026-04-28T18:10:32Z

@leejet my branch has all the offloading fixes that allow me to run the Q6_K model on my 16 GB + 10 GB VRAM setup :)

Could you submit the vram fit changes as a separate PR?

pwilkin · 2026-04-28T18:11:12Z

@leejet yeah of course, as soon as I get everything confirmed working I'll just isolate the changes and integrate with the memory management PR mentioned above.

pwilkin · 2026-04-29T15:34:01Z

FINALLY got the thingy working. Generated on my potato, 3080 10 GB + 5060 16 GB, full VRAM processing:

[INFO ] stable-diffusion.cpp:4687 - generate_video completed in 1303.16s

1280x738, 95 frames @ 25 fps, 30 steps, cfg 3.0, prompt "a cat eating a tensor in cyberspace":

https://youtu.be/SF5mi6jdlL4

pwilkin · 2026-05-01T16:12:34Z

Superseded by #1463 + #1470 (branch on my fork merging both: https://github.com/pwilkin/stable-diffusion.cpp/tree/backend-fit-ltx2 )

mudler mentioned this pull request Apr 24, 2026

feat: add LTX-2 video generation support #1459

Closed

wbruna reviewed Apr 24, 2026

View reviewed changes

pwilkin marked this pull request as draft April 29, 2026 16:31

Squash: ltx-2

6732907

pwilkin force-pushed the ltx-2 branch from d2623c5 to 6732907 Compare April 30, 2026 10:08

pwilkin closed this May 1, 2026

Conversation

pwilkin commented Apr 23, 2026

Uh oh!

Green-Sky commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

JohnLoveJoy commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

mudler commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

wbruna Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

mudler commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

pwilkin commented Apr 24, 2026

Uh oh!

leejet commented Apr 27, 2026

Uh oh!

pwilkin commented Apr 27, 2026

Uh oh!

leejet commented Apr 28, 2026

Uh oh!

pwilkin commented Apr 28, 2026

Uh oh!

pwilkin commented Apr 28, 2026

Uh oh!

leejet commented Apr 28, 2026

Uh oh!

leejet commented Apr 28, 2026

Uh oh!

pwilkin commented Apr 28, 2026

Uh oh!

pwilkin commented Apr 29, 2026

Uh oh!

pwilkin commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pwilkin commented Apr 24, 2026 •

edited

Loading