Sort_by providing zero-copy views (and more)#666
Merged
Conversation
sort_by and the zero-permutation sorted_slice window now work for dictionary[str] and fixed blosc2.string columns: - dictionary[str]: index by alphabetical rank (int32), reusing the numeric window path. _DictRankWrapper exposes only live rows so the index matches n_rows (else padding rejected the window read). - fixed string: build the FULL index by computing segment min/max with a manual loop (numpy lacks the <U/<S ufunc loop), accept S/U in _supported_index_dtype, and add a numpy OOC merge fallback. - staleness: rank index goes stale when the dictionary changes; detect via a stable SHA-1 hash of entries (hash() is seed-salted and would spuriously mismatch across processes), fall back to lexsort until rebuild_index. Add tests/ctable/test_sort_by_strings.py (dict/string sort + window, staleness, cross-process hash stability).
Press 'S' on a CTable data grid to sort by a FULL-indexed column via a dropdown (R toggles reverse). The result is a zero-copy sort_by(view=True) that streams from the index, so the table is never materialised; navigate it normally, Esc restores original order. A SORTED chip shows in the status bar (R reverses an active sort in place). Model: set_sort/clear_sort/get_sort + full_index_columns, and a single _ordered_object() read-precedence helper (window > filter > sort > base) replacing five duplicated inline blocks. Sort and filter are mutually exclusive; a row window composes over a sort.
Index reads (where() pruning, summary/min-max lookups) cache file-backed sidecar handles in process-global dicts for query reuse. These were only dropped once the underlying files were deleted, so closing a table that stays on disk kept its descriptors open — one file descriptor leaked per table, exhausting the FD limit over long sessions (and large test runs). Add evict_cached_index_handles(root), which pops (and thereby releases) every _SIDECAR_HANDLE_CACHE / _DATA_CACHE / _HOT_CACHE / query / gather entry whose scope path is at or under a table's resolved root. Call it from FileTableStorage and TreeStoreTableStorage close()/discard(); the caches simply repopulate on the next query. Fixes FD exhaustion when opening/closing many indexed tables; the full test suite now passes at the default macOS ulimit -n 256.
…out of every modal
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In this PR: