merge_batcher: weigh the geometric ladder by updates; split Merger::account#767
Merged
frankmcsherry merged 1 commit intoJun 21, 2026
Conversation
…ccount Two related changes to the `MergeBatcher`, lifted out of the chunk_basis work so they can land independently of the `Chunk` module that motivates them. Weigh-by-updates. The geometric chain ladder previously compared chains by their *chunk count* (`chain.len()`). That is only a proxy for update count while every chunk is the same size; once a backend regrades — re-melding chunks so size and count decouple — a trickle of single-update chunks re-merges the head chain on every insert. Weigh chains by summed updates instead. A chain is immutable until merged, so its weight is computed once at push and cached alongside it (`chains: Vec<(usize, Vec<Chunk>)>`). Neutral for the existing `vec` backend (uniform chunk sizes make count and update-weight proportional); the behaviour change only bites a regrading backend. Refocus the Merger trait. The bundled `account() -> (records, size, capacity, allocations)` splits into `len() -> usize` (update count — drives the ladder and the logger's `records` field) and a defaulted `allocation() -> (size, capacity, allocations)` for memory telemetry. The logger tuple is reassembled verbatim via a private `record` helper, so `BatcherEvent`'s shape and the emitted figures are unchanged. NOTE: breaking change for out-of-tree `Merger` implementors (e.g. Materialize) — rename `account` -> `len`, optionally override `allocation`. The only in-tree impl (`vec::VecMerger`) is migrated here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related changes to the
MergeBatcher, lifted out of the chunk_basis work so they can land independently of theChunkmodule that motivates them.Weigh-by-updates. The geometric chain ladder previously compared chains by their chunk count (
chain.len()). That is only a proxy for update count while every chunk is the same size; once a backend regrades — re-melding chunks so size and count decouple — a trickle of single-update chunks re-merges the head chain on every insert. Weigh chains by summed updates instead. A chain is immutable until merged, so its weight is computed once at push and cached alongside it (chains: Vec<(usize, Vec<Chunk>)>). Neutral for the existingvecbackend (uniform chunk sizes make count and update-weight proportional); the behaviour change only bites a regrading backend.Refocus the Merger trait. The bundled
account() -> (records, size, capacity, allocations)splits intolen() -> usize(update count — drives the ladder and the logger'srecordsfield) and a defaultedallocation() -> (size, capacity, allocations)for memory telemetry. The logger tuple is reassembled verbatim via a privaterecordhelper, soBatcherEvent's shape and the emitted figures are unchanged.NOTE: breaking change for out-of-tree
Mergerimplementors (e.g. Materialize) — renameaccount->len, optionally overrideallocation. The only in-tree impl (vec::VecMerger) is migrated here.