polygonize: jit the float-chunk region ranges scan (#3303)#3314
Merged
Conversation
_compute_region_value_ranges looped over every pixel of a float chunk in pure Python, which took ~95% of _polygonize_chunk time on the dask path, and re-ran _calculate_regions on a block _polygonize_numpy had already labelled. Move the per-pixel work into the jitted _region_ranges_scan, reuse the region labels via the new _polygonize_numpy_regions, and keep only the per-region tuple/dict assembly in Python. A 1024x1024 float64 chunk with ~180 regions drops from 0.299s to 0.015s. This also fixes the recorded w_match/s_match boundary-cell flags: the old loop called _is_close (a numba overload generator) from pure Python, which returns the implementation function instead of a bool, so the flags were truthy whenever the position/mask conditions held. Inside the jitted kernel _is_close resolves properly. Output is unchanged (the buggy flags were provably output-neutral); existing 527 polygonize tests pass and the new test file pins ranges/flags parity and dask-vs-numpy float output.
brendancol
commented
Jun 12, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: polygonize: jit the float-chunk region ranges scan (#3303)
Blockers (must fix before merge)
None found.
Suggestions (should fix, not blocking)
-
benchmarks/benchmarks/polygonize.pyhas no benchmark for the float dask chunk path this PR optimizes. The existingPolygonize/PolygonizeCCLclasses use int32 numpy/cupy input, so a regression in_region_ranges_scanor_compute_region_value_rangeswould not show up in asv. Add a float64 dask case.
Nits (optional improvements)
-
_compute_region_value_ranges(xrspatial/polygonize.py:1485) trusts a caller-suppliedregionsto match the block's padded flat shape. Alen(regions) == nx_eff * nycheck would catch a mismatched caller early. There is only one internal caller today and the parity tests cover it, so this is cheap insurance rather than a real risk.
What looks good
- The jitted kernel mirrors the original loop's skip conditions exactly: region 0, masked pixels,
idx >= n_regions, and the nx==1 padded column. I checked themax_cells = 2*(nx+ny)bound; each strip pixel records at most once, so the arrays cannot overflow (2nx+2ny-4 distinct strip pixels for nx,ny >= 2, fewer with masking). - Region-label reuse goes through a new
_polygonize_numpy_regionsand keeps_polygonize_numpyas a wrapper with a parity test, plus aregions=Nonefallback that the tests compare against the precomputed path. - The
w_match/s_matchflag fix is real:_is_closeis an overload generator and returns its impl function when called from pure Python, so the old flags were truthy whenever the position/mask conditions held. In jit it resolves correctly.test_w_s_flags_record_real_closenesspins both polarities and fails on the old code. - The dask-vs-numpy float test compares geometric equality and documents why raw vertex arrays differ (merged boundary rings can start at a different vertex; that divergence predates this PR).
Checklist
- Algorithm matches reference (kernel is a line-for-line port of the Python loop, verified by a
_values_closereference implementation in the tests) - All implemented backends produce consistent results (cupy, dask+numpy, dask+cupy smoke-checked against numpy)
- NaN handling is correct (NaN-mask prep unchanged; float-nan test case included)
- Edge cases are covered by tests (nx==1, masked, float32, conn 4/8, raster-edge vs internal-edge offsets)
- Dask chunk boundaries handled correctly (boundary-cell keys/values byte-identical to old code; flags now correct)
- No premature materialization or unnecessary copies
- Benchmark exists for the optimized path (see suggestion)
- README feature matrix: not applicable, no public API change
- Docstrings present and accurate (internal functions documented, regions param described)
brendancol
commented
Jun 12, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review after 32cc304
Both findings from the first pass are addressed:
PolygonizeDaskFloatbenchmark added (benchmarks/benchmarks/polygonize.py). It builds a float64 dask raster in 4x4 chunks and times the publicpolygonize, which exercises_region_ranges_scanand the cross-chunk merge. Smoke-ran locally._compute_region_value_rangesnow rejects a caller-suppliedregionswhose length does not match the block's padded flat size (xrspatial/polygonize.py:1551).
Note: flake8 reports F401 for _simplify_polygons in benchmarks/benchmarks/polygonize.py; that import was already unused on main before this PR and is left alone here.
Full polygonize battery still green (527 passed, 16 skipped). No further findings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3303.
_compute_region_value_rangeslooped over every pixel of a float chunk in pure Python and re-ran_calculate_regionson a block_polygonize_numpyhad already labelled. On a 1024x1024 float64 chunk with ~180 regions that loop was 0.283 s of the 0.299 s_polygonize_chunktotal, making float dask chunks about 30x slower than integer ones._region_ranges_scan. Only the per-region tuple/dict assembly stays in Python. The same chunk now takes 0.015 s._polygonize_numpy_regionsso the chunk path reuses the region labels instead of recomputing them._polygonize_numpyis a thin wrapper with unchanged output._compute_region_value_rangeskeeps aregions=Nonefallback that recomputes, and the tests check both paths agree._is_close(a numbaoverloadgenerator) from pure Python, which returns the implementation function instead of a bool, so the recordedw_match/s_matchflags were truthy whenever the position/mask conditions held. That was output-neutral (when the W/S neighbour is absent from the global boundary-cell field, the gated diagonal is always a same-chunk pair the in-chunk CCL already resolved), but the flags were wrong. Inside the jitted kernel_is_closeresolves correctly, and a new test pins the real values.Backends: behaviour unchanged on numpy and cupy (they never hit this code path); dask+numpy and dask+cupy float rasters get the speedup. Verified on GPU: cupy int/float and dask+cupy runs produce the same polygons as numpy.
Test plan:
test_polygonize_issue_3303.py: precomputed-regions vs recompute parity, ranges/flags vs a naive_values_closereference, flag polarity (fails on the old code), dask-vs-numpy float output across chunkings and connectivities,_polygonize_numpywrapper parity.No wall-clock assertions added (see #2889).