Add Ideogram support and improve BF16 dequantization handling#459
Add Ideogram support and improve BF16 dequantization handling#459molbal wants to merge 6 commits into
Conversation
|
Great! I successfully ran it, but my device doesn't support bf16; it gets converted to fp32 computation, which makes it very slow. Can you make it run on my device in fp16? [INFO] got prompt |
|
Hi @yu234567 - try now. It should work better now, can you verify please? |
Thank you so much, it worked! |
|
classifies Qwen3-VL-8B-Instruct q4_0 quant as _k quant and errors out ? is this expected or what quantization are you running for the te |
- Updated IMG_ARCH_LIST to include 'krea2'. - Introduced ModelKrea2 class with architecture details and tensor handling. - Enhanced convert_file function to support quantization types. - Added tools/convert_krea2_gguf.py for batch conversion of Krea-2 models to multiple GGUF quant levels.
…ation support limitations
…tensors and skip 0-dim scalars during GGUF conversion
Summary
This adds support for Ideogram GGUF models.
What Changed
ideogramto the supported image GGUF architectures.Notes
Tested on Windows 11, Python version: 3.12.11 (main, Jul 23 2025, 00:32:20) [MSC v.1944 64 bit (AMD64)] [INFO] Total VRAM 8192 MB, total RAM 48394 MB
[INFO] pytorch version: 2.12.0+cu130
[INFO] Set vram state to: LOW_VRAM
[INFO] Device: cuda:0 NVIDIA GeForce RTX 3080 Laptop GPU
Tested with Q4_0 gguf from https://huggingface.co/leejet/ideogram-4-GGUF
Other GGUF quant types still use the existing dequant paths.