Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404
Open
Osamaali313 wants to merge 1 commit into
Open
Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404Osamaali313 wants to merge 1 commit into
Osamaali313 wants to merge 1 commit into
Conversation
The CLIP truncation warning used `not np.equal(text_input_ids, untruncated_ids)`. `np.equal` returns an element-wise boolean array, so applying `not` raises `ValueError: The truth value of an array with more than one element is ambiguous` when the shapes match, and a broadcasting `ValueError` when they differ (the actual truncation case). Since text_input_ids is always padded to model_max_length (77), this branch is reached for any prompt that tokenizes to >= 77 tokens, crashing prompt encoding before generation -- including exactly the truncation case the warning is meant to report. Use `np.array_equal`, which returns a scalar bool and safely returns False for differing shapes (matching the `torch.equal` semantics this was ported from), so the warning fires correctly instead of raising.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StableDiffusionPipeline._encode_promptcrashes on any prompt that reaches the CLIP token limit.The truncation-warning guard (
python_coreml_stable_diffusion/pipeline.py) is:np.equalreturns an element-wise boolean array, sonot <array>raises:ValueError: The truth value of an array with more than one element is ambiguouswhen the shapes match, andValueError: operands could not be broadcast together with shapes (1,77) (1,90)when they differ — i.e. the real truncation case.text_input_idsis always padded totokenizer.model_max_length(77) anduntruncated_idsusespadding="longest", so this branch is entered whenever a prompt tokenizes to ≥ 77 tokens. Such prompts crash at prompt-encoding time, before generation — including precisely the truncation scenario the warning is meant to surface. (Short prompts short-circuit on the length check, which is why this isn't always hit.)This is a port of the upstream
diffuserscheck, which usestorch.equal(...)— a function returning a scalar Python bool, sonot torch.equal(...)is valid. The numpy port should use the scalar-returningnp.array_equal.Fix
np.array_equalreturns a scalar bool and returnsFalsefor differing shapes (matchingtorch.equalsemantics), so the warning fires correctly instead of raising.Validation
Reproduced with faithful tokenizer shapes (
text_input_ids=[1,77]):np.equal(current)np.array_equal(fixed)[1,77])[1,90])[1,5])The existing
tests/are end-to-end integration tests (full Core ML conversion + model downloads + Swift CLI), so this guard cannot be exercised as an isolated unit test; the table above is the standalone repro.