Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit by Osamaali313 · Pull Request #404 · apple/ml-stable-diffusion

Osamaali313 · 2026-06-21T13:25:28Z

Summary

StableDiffusionPipeline._encode_prompt crashes on any prompt that reaches the CLIP token limit.

The truncation-warning guard (python_coreml_stable_diffusion/pipeline.py) is:

if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.equal(
        text_input_ids, untruncated_ids
):

np.equal returns an element-wise boolean array, so not <array> raises:

ValueError: The truth value of an array with more than one element is ambiguous when the shapes match, and
ValueError: operands could not be broadcast together with shapes (1,77) (1,90) when they differ — i.e. the real truncation case.

text_input_ids is always padded to tokenizer.model_max_length (77) and untruncated_ids uses padding="longest", so this branch is entered whenever a prompt tokenizes to ≥ 77 tokens. Such prompts crash at prompt-encoding time, before generation — including precisely the truncation scenario the warning is meant to surface. (Short prompts short-circuit on the length check, which is why this isn't always hit.)

This is a port of the upstream diffusers check, which uses torch.equal(...) — a function returning a scalar Python bool, so not torch.equal(...) is valid. The numpy port should use the scalar-returning np.array_equal.

Fix

-if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.equal(
+if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not np.array_equal(
         text_input_ids, untruncated_ids
 ):

np.array_equal returns a scalar bool and returns False for differing shapes (matching torch.equal semantics), so the warning fires correctly instead of raising.

Validation

Reproduced with faithful tokenizer shapes (text_input_ids=[1,77]):

case	`np.equal` (current)	`np.array_equal` (fixed)
no truncation (`[1,77]`)	ValueError: ambiguous truth value	no warning
truncation (`[1,90]`)	ValueError: broadcast	warns (correct)
short prompt (`[1,5]`)	no warning	no warning

The existing tests/ are end-to-end integration tests (full Core ML conversion + model downloads + Swift CLI), so this guard cannot be exercised as an isolated unit test; the table above is the standalone repro.

The CLIP truncation warning used `not np.equal(text_input_ids, untruncated_ids)`. `np.equal` returns an element-wise boolean array, so applying `not` raises `ValueError: The truth value of an array with more than one element is ambiguous` when the shapes match, and a broadcasting `ValueError` when they differ (the actual truncation case). Since text_input_ids is always padded to model_max_length (77), this branch is reached for any prompt that tokenizes to >= 77 tokens, crashing prompt encoding before generation -- including exactly the truncation case the warning is meant to report. Use `np.array_equal`, which returns a scalar bool and safely returns False for differing shapes (matching the `torch.equal` semantics this was ported from), so the warning fires correctly instead of raising.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404

Fix ValueError in _encode_prompt truncation check for prompts at the CLIP token limit#404
Osamaali313 wants to merge 1 commit into
apple:mainfrom
Osamaali313:fix/encode-prompt-array-equal

Osamaali313 commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Osamaali313 commented Jun 21, 2026

Summary

Fix

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant