TruncateTransform.project appears to incorrectly project NOT STARTS WITH predicates for truncated string/binary partition fields.
For truncate[2], PyIceberg currently projects:
NOT STARTS WITH "aaa" -> NOT STARTS WITH "aa"
That is unsafe: the truncated partition value does not contain enough information to prove all rows fail the original predicate, so files with matching rows can be pruned.
Expected behavior should match apache/iceberg-go#1193 / Java truncate projection behavior:
- prefix length < truncate width: keep
NOT STARTS WITH with the original literal
- prefix length == truncate width: project to
!=
- prefix length > truncate width: no inclusive projection
Relevant code: pyiceberg/transforms.py _truncate_array, plus the existing test_projection_truncate_string_not_starts_with expectation.
TruncateTransform.projectappears to incorrectly projectNOT STARTS WITHpredicates for truncated string/binary partition fields.For
truncate[2], PyIceberg currently projects:That is unsafe: the truncated partition value does not contain enough information to prove all rows fail the original predicate, so files with matching rows can be pruned.
Expected behavior should match apache/iceberg-go#1193 / Java truncate projection behavior:
NOT STARTS WITHwith the original literal!=Relevant code:
pyiceberg/transforms.py_truncate_array, plus the existingtest_projection_truncate_string_not_starts_withexpectation.