Skip to content

Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404

Open
Osamaali313 wants to merge 1 commit into
apple:mainfrom
Osamaali313:fix/encode-prompt-array-equal
Open

Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404
Osamaali313 wants to merge 1 commit into
apple:mainfrom
Osamaali313:fix/encode-prompt-array-equal

Conversation

@Osamaali313

Copy link
Copy Markdown

Summary

StableDiffusionPipeline._encode_prompt crashes on any prompt that reaches the CLIP token limit.

The truncation-warning guard (python_coreml_stable_diffusion/pipeline.py) is:

if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.equal(
        text_input_ids, untruncated_ids
):

np.equal returns an element-wise boolean array, so not <array> raises:

  • ValueError: The truth value of an array with more than one element is ambiguous when the shapes match, and
  • ValueError: operands could not be broadcast together with shapes (1,77) (1,90) when they differ — i.e. the real truncation case.

text_input_ids is always padded to tokenizer.model_max_length (77) and untruncated_ids uses padding="longest", so this branch is entered whenever a prompt tokenizes to ≥ 77 tokens. Such prompts crash at prompt-encoding time, before generation — including precisely the truncation scenario the warning is meant to surface. (Short prompts short-circuit on the length check, which is why this isn't always hit.)

This is a port of the upstream diffusers check, which uses torch.equal(...) — a function returning a scalar Python bool, so not torch.equal(...) is valid. The numpy port should use the scalar-returning np.array_equal.

Fix

-if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.equal(
+if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.array_equal(
         text_input_ids, untruncated_ids
 ):

np.array_equal returns a scalar bool and returns False for differing shapes (matching torch.equal semantics), so the warning fires correctly instead of raising.

Validation

Reproduced with faithful tokenizer shapes (text_input_ids=[1,77]):

case np.equal (current) np.array_equal (fixed)
no truncation ([1,77]) ValueError: ambiguous truth value no warning
truncation ([1,90]) ValueError: broadcast warns (correct)
short prompt ([1,5]) no warning no warning

The existing tests/ are end-to-end integration tests (full Core ML conversion + model downloads + Swift CLI), so this guard cannot be exercised as an isolated unit test; the table above is the standalone repro.

The CLIP truncation warning used `not np.equal(text_input_ids,
untruncated_ids)`. `np.equal` returns an element-wise boolean array, so
applying `not` raises `ValueError: The truth value of an array with more
than one element is ambiguous` when the shapes match, and a broadcasting
`ValueError` when they differ (the actual truncation case). Since
text_input_ids is always padded to model_max_length (77), this branch is
reached for any prompt that tokenizes to >= 77 tokens, crashing prompt
encoding before generation -- including exactly the truncation case the
warning is meant to report.

Use `np.array_equal`, which returns a scalar bool and safely returns False
for differing shapes (matching the `torch.equal` semantics this was ported
from), so the warning fires correctly instead of raising.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant