polygonize: jit the float-chunk region ranges scan (#3303) by brendancol · Pull Request #3314 · xarray-contrib/xarray-spatial

brendancol · 2026-06-12T19:04:00Z

Closes #3303.

_compute_region_value_ranges looped over every pixel of a float chunk in pure Python and re-ran _calculate_regions on a block _polygonize_numpy had already labelled. On a 1024x1024 float64 chunk with ~180 regions that loop was 0.283 s of the 0.299 s _polygonize_chunk total, making float dask chunks about 30x slower than integer ones.

Move the per-pixel min/max aggregation and internal-edge cell collection into the jitted _region_ranges_scan. Only the per-region tuple/dict assembly stays in Python. The same chunk now takes 0.015 s.
Add _polygonize_numpy_regions so the chunk path reuses the region labels instead of recomputing them. _polygonize_numpy is a thin wrapper with unchanged output. _compute_region_value_ranges keeps a regions=None fallback that recomputes, and the tests check both paths agree.
Side effect worth knowing about: the old loop called _is_close (a numba overload generator) from pure Python, which returns the implementation function instead of a bool, so the recorded w_match/s_match flags were truthy whenever the position/mask conditions held. That was output-neutral (when the W/S neighbour is absent from the global boundary-cell field, the gated diagonal is always a same-chunk pair the in-chunk CCL already resolved), but the flags were wrong. Inside the jitted kernel _is_close resolves correctly, and a new test pins the real values.

Backends: behaviour unchanged on numpy and cupy (they never hit this code path); dask+numpy and dask+cupy float rasters get the speedup. Verified on GPU: cupy int/float and dask+cupy runs produce the same polygons as numpy.

Test plan:

New test_polygonize_issue_3303.py: precomputed-regions vs recompute parity, ranges/flags vs a naive _values_close reference, flag polarity (fails on the old code), dask-vs-numpy float output across chunkings and connectivities, _polygonize_numpy wrapper parity.
Full polygonize battery: 527 passed, 16 skipped.
flake8 clean on changed files.
cupy / dask+cupy / dask+numpy smoke runs match numpy output.

No wall-clock assertions added (see #2889).

_compute_region_value_ranges looped over every pixel of a float chunk in pure Python, which took ~95% of _polygonize_chunk time on the dask path, and re-ran _calculate_regions on a block _polygonize_numpy had already labelled. Move the per-pixel work into the jitted _region_ranges_scan, reuse the region labels via the new _polygonize_numpy_regions, and keep only the per-region tuple/dict assembly in Python. A 1024x1024 float64 chunk with ~180 regions drops from 0.299s to 0.015s. This also fixes the recorded w_match/s_match boundary-cell flags: the old loop called _is_close (a numba overload generator) from pure Python, which returns the implementation function instead of a bool, so the flags were truthy whenever the position/mask conditions held. Inside the jitted kernel _is_close resolves properly. Output is unchanged (the buggy flags were provably output-neutral); existing 527 polygonize tests pass and the new test file pins ranges/flags parity and dask-vs-numpy float output.

brendancol

PR Review: polygonize: jit the float-chunk region ranges scan (#3303)

Blockers (must fix before merge)

None found.

Suggestions (should fix, not blocking)

benchmarks/benchmarks/polygonize.py has no benchmark for the float dask chunk path this PR optimizes. The existing Polygonize/PolygonizeCCL classes use int32 numpy/cupy input, so a regression in _region_ranges_scan or _compute_region_value_ranges would not show up in asv. Add a float64 dask case.

Nits (optional improvements)

_compute_region_value_ranges (xrspatial/polygonize.py:1485) trusts a caller-supplied regions to match the block's padded flat shape. A len(regions) == nx_eff * ny check would catch a mismatched caller early. There is only one internal caller today and the parity tests cover it, so this is cheap insurance rather than a real risk.

What looks good

The jitted kernel mirrors the original loop's skip conditions exactly: region 0, masked pixels, idx >= n_regions, and the nx==1 padded column. I checked the max_cells = 2*(nx+ny) bound; each strip pixel records at most once, so the arrays cannot overflow (2nx+2ny-4 distinct strip pixels for nx,ny >= 2, fewer with masking).
Region-label reuse goes through a new _polygonize_numpy_regions and keeps _polygonize_numpy as a wrapper with a parity test, plus a regions=None fallback that the tests compare against the precomputed path.
The w_match/s_match flag fix is real: _is_close is an overload generator and returns its impl function when called from pure Python, so the old flags were truthy whenever the position/mask conditions held. In jit it resolves correctly. test_w_s_flags_record_real_closeness pins both polarities and fails on the old code.
The dask-vs-numpy float test compares geometric equality and documents why raw vertex arrays differ (merged boundary rings can start at a different vertex; that divergence predates this PR).

Checklist

Algorithm matches reference (kernel is a line-for-line port of the Python loop, verified by a _values_close reference implementation in the tests)
All implemented backends produce consistent results (cupy, dask+numpy, dask+cupy smoke-checked against numpy)
NaN handling is correct (NaN-mask prep unchanged; float-nan test case included)
Edge cases are covered by tests (nx==1, masked, float32, conn 4/8, raster-edge vs internal-edge offsets)
Dask chunk boundaries handled correctly (boundary-cell keys/values byte-identical to old code; flags now correct)
No premature materialization or unnecessary copies
Benchmark exists for the optimized path (see suggestion)
README feature matrix: not applicable, no public API change
Docstrings present and accurate (internal functions documented, regions param described)

)

brendancol

Follow-up review after `32cc304`

Both findings from the first pass are addressed:

PolygonizeDaskFloat benchmark added (benchmarks/benchmarks/polygonize.py). It builds a float64 dask raster in 4x4 chunks and times the public polygonize, which exercises _region_ranges_scan and the cross-chunk merge. Smoke-ran locally.
_compute_region_value_ranges now rejects a caller-supplied regions whose length does not match the block's padded flat size (xrspatial/polygonize.py:1551).

Note: flake8 reports F401 for _simplify_polygons in benchmarks/benchmarks/polygonize.py; that import was already unused on main before this PR and is left alone here.

Full polygonize battery still green (527 passed, 16 skipped). No further findings.

brendancol added 2 commits June 12, 2026 12:02

Update sweep-performance state for polygonize (2026-06-12 audit)

68195d3

github-actions Bot added the performance PR touches performance-sensitive code label Jun 12, 2026

brendancol commented Jun 12, 2026

View reviewed changes

Address review: add float dask benchmark and regions length guard (#3303

32cc304

)

brendancol commented Jun 12, 2026

View reviewed changes

brendancol merged commit f14947a into main Jun 12, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygonize: jit the float-chunk region ranges scan (#3303)#3314

polygonize: jit the float-chunk region ranges scan (#3303)#3314
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-polygonize-2026-06-12

brendancol commented Jun 12, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 12, 2026

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: polygonize: jit the float-chunk region ranges scan (#3303)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review after 32cc304

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up review after `32cc304`