Skip to content

Binary StartsWith visitors compare bytes as strings #3497

@kevinjqliu

Description

@kevinjqliu

Several StartsWith / NotStartsWith visitors appear to compare binary values by converting them with str(...), which gives incorrect byte-prefix behavior.

Examples:

StartsWith("x", b"a") on b"aa" -> False  # expected True
NotStartsWith("x", b"a") on b"aa" -> True # expected False

This also affects planning paths:

inclusive metrics for a file containing only b"aa":
StartsWith("x", b"a") -> False  # unsafe skip

Manifest evaluation for binary StartsWith can also raise TypeError when comparing decoded bytes bounds to a str prefix.

Relevant code: pyiceberg/expressions/visitors.py starts-with handling in expression, manifest, metrics, and residual visitors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions