Skip to content

feat: LTX-2 support#1458

Closed
pwilkin wants to merge 1 commit into
leejet:masterfrom
pwilkin:ltx-2
Closed

feat: LTX-2 support#1458
pwilkin wants to merge 1 commit into
leejet:masterfrom
pwilkin:ltx-2

Conversation

@pwilkin

@pwilkin pwilkin commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Please have mercy, had to murder my Claude Code to get this working.

SD_CUDA_DEVICE=1 SD_CUDA_DEVICE_CLIP=-1 SD_CUDA_DEVICE_VAE=0 timeout 1800 ./bin/sd-cli -M vid_gen \
    --diffusion-model /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev-Q5_K_S.gguf \
    --llm /media/ilintar/D_SSD/models/ltx-2/gemma-3-12b-it-qat-IQ4_XS.gguf \
    --vae /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_video_vae.safetensors \
    -m /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_embeddings_connectors.safetensors \
    --gemma-tokenizer /home/ilintar/.cache/huggingface/hub/models--google--gemma-3-12b-it/snapshots/96b6f1eccf38110c56df3a15bffe176da04bfd80/tokenizer.json \
    -W 640 -H 480 --video-frames 25 --steps 60 --fps 24 --cfg-scale 6.0 --seed 42 \
    -p "a cat walking on a sandy beach at sunset, cinematic, 4k" \
    -o /tmp/ltx2_smoke.webm
ltx2_smoke_v2.webm

@Green-Sky

Copy link
Copy Markdown
Contributor

I think there is some good stuff we can pull out of here (:

btw, gemma-3-12b-it-qat-IQ4_XS.gguf why iq4 of qat?

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@Green-Sky that's a very good question, probably "because I wasn't thinking about it" is the proper answer ;)

@JohnLoveJoy

Copy link
Copy Markdown

Great work. How does this perform compared to ComfyUI?

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

Haven't compared yet but gonna optimize further.

@mudler

mudler commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

wow! was actually playing with it myself as well with Claude letting it go by itself. Will open up a PR just for reference, got this working with claude as well yesterday

this is the result I got with it

ltx23_fix

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

Slightly funky still, so guess there's a subtle error somewhere, but I added fitting, so I managed to get 80 frames at 720p ("a black cat jumping at a brown mouse on green grass"):

ltx2_cat_mouse_720p.webm

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

Probably FA is the culprit here - I'm running this on 26 GB VRAM total (3080 10 GB + 5060 16 GB), so really struggling to get anything reasonable :)

Comment thread src/stable-diffusion.cpp
// SD_CUDA_DEVICE_VAE VAE (falls back to SD_CUDA_DEVICE)
// SD_CUDA_DEVICE_CONTROL ControlNet (falls back to SD_CUDA_DEVICE)
// SD_VK_DEVICE same pattern for the Vulkan build
// Setting any of these to -1 forces CPU for that component.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reminder: this should be coordinated with #1184 .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is just a rough PoC for now.

@mudler

mudler commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

I'm using the distilled model:

~/ltxv-sd-cpp/build-cuda/bin/sd-cli -M vid_gen \                              
    -m ltxv-models/ltx-2.3-22b-distilled.safetensors \                                                                                                                                                    
    --text-encoder gemma-3-12b-it \                                                                                                                                                                       
    -p 'a cat walking across a grassy field' \                     
    -W 768 -H 512 --video-frames 121 \                                                                                                                                                                    
    --steps 8 --cfg-scale 1 \                                                   
    -o /tmp/ltx23_clean.webp --seed 42                                                                      

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@mudler yeah I'm doing full for some reason (probably the same one that caused me to pick IQ4_XS :D)

@pwilkin

pwilkin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

So apparently there are some major divergences between CPU and CUDA Gemma3, which is a bit surprising (and it happens on both Q4_0 and the IQ4_XS quants).

@leejet

leejet commented Apr 27, 2026

Copy link
Copy Markdown
Owner

I’m also working on adding support for ltx2.3. So far, I’ve confirmed that both Gemma and VAE are working fine. However, there are still some issues with the diffusion model, and I’m currently trying to locate the problem.
#1463

@pwilkin

pwilkin commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

@leejet funny because I seem to have had the opposite problem :) DiT seems to be working fine here, but no matter what I do I can't achieve parity on Gemma with CUDA vs CPU.

Gonna try on higher quants if I can somehow fit them on my potato 😄

@leejet

leejet commented Apr 28, 2026

Copy link
Copy Markdown
Owner

Currently, I’ve fixed the DiT issues in my branch, and Gemma3 works well on both CPU and CUDA. Feel free to try my implementation.

@pwilkin

pwilkin commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Jeez, the issue was I was using the +1 norm when the GGUF conversion already bakes the +1 norm shift. Stupid error.

@pwilkin

pwilkin commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@leejet my branch has all the offloading fixes that allow me to run the Q6_K model on my 16 GB + 10 GB VRAM setup :)

@leejet

leejet commented Apr 28, 2026

Copy link
Copy Markdown
Owner

Jeez, the issue was I was using the +1 norm when the GGUF conversion already bakes the +1 norm shift. Stupid error.

This issue also took me quite some time to fix.

@leejet

leejet commented Apr 28, 2026

Copy link
Copy Markdown
Owner

@leejet my branch has all the offloading fixes that allow me to run the Q6_K model on my 16 GB + 10 GB VRAM setup :)

Could you submit the vram fit changes as a separate PR?

@pwilkin

pwilkin commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@leejet yeah of course, as soon as I get everything confirmed working I'll just isolate the changes and integrate with the memory management PR mentioned above.

@pwilkin

pwilkin commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

FINALLY got the thingy working. Generated on my potato, 3080 10 GB + 5060 16 GB, full VRAM processing:

[INFO ] stable-diffusion.cpp:4687 - generate_video completed in 1303.16s

1280x738, 95 frames @ 25 fps, 30 steps, cfg 3.0, prompt "a cat eating a tensor in cyberspace":

https://youtu.be/SF5mi6jdlL4

@pwilkin pwilkin marked this pull request as draft April 29, 2026 16:31
@pwilkin

pwilkin commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #1463 + #1470 (branch on my fork merging both: https://github.com/pwilkin/stable-diffusion.cpp/tree/backend-fit-ltx2 )

@pwilkin pwilkin closed this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants