Skip to content

Incorrect NOT STARTS WITH projection for truncated partitions #3493

@kevinjqliu

Description

@kevinjqliu

TruncateTransform.project appears to incorrectly project NOT STARTS WITH predicates for truncated string/binary partition fields.

For truncate[2], PyIceberg currently projects:

NOT STARTS WITH "aaa" -> NOT STARTS WITH "aa"

That is unsafe: the truncated partition value does not contain enough information to prove all rows fail the original predicate, so files with matching rows can be pruned.

Expected behavior should match apache/iceberg-go#1193 / Java truncate projection behavior:

  • prefix length < truncate width: keep NOT STARTS WITH with the original literal
  • prefix length == truncate width: project to !=
  • prefix length > truncate width: no inclusive projection

Relevant code: pyiceberg/transforms.py _truncate_array, plus the existing test_projection_truncate_string_not_starts_with expectation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions