Skip to content

merge_batcher: weigh the geometric ladder by updates; split Merger::account#767

Merged
frankmcsherry merged 1 commit into
TimelyDataflow:master-nextfrom
frankmcsherry:batcher_weigh_by_updates
Jun 21, 2026
Merged

merge_batcher: weigh the geometric ladder by updates; split Merger::account#767
frankmcsherry merged 1 commit into
TimelyDataflow:master-nextfrom
frankmcsherry:batcher_weigh_by_updates

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Two related changes to the MergeBatcher, lifted out of the chunk_basis work so they can land independently of the Chunk module that motivates them.

Weigh-by-updates. The geometric chain ladder previously compared chains by their chunk count (chain.len()). That is only a proxy for update count while every chunk is the same size; once a backend regrades — re-melding chunks so size and count decouple — a trickle of single-update chunks re-merges the head chain on every insert. Weigh chains by summed updates instead. A chain is immutable until merged, so its weight is computed once at push and cached alongside it (chains: Vec<(usize, Vec<Chunk>)>). Neutral for the existing vec backend (uniform chunk sizes make count and update-weight proportional); the behaviour change only bites a regrading backend.

Refocus the Merger trait. The bundled account() -> (records, size, capacity, allocations) splits into len() -> usize (update count — drives the ladder and the logger's records field) and a defaulted allocation() -> (size, capacity, allocations) for memory telemetry. The logger tuple is reassembled verbatim via a private record helper, so BatcherEvent's shape and the emitted figures are unchanged.

NOTE: breaking change for out-of-tree Merger implementors (e.g. Materialize) — rename account -> len, optionally override allocation. The only in-tree impl (vec::VecMerger) is migrated here.

…ccount

Two related changes to the `MergeBatcher`, lifted out of the chunk_basis work
so they can land independently of the `Chunk` module that motivates them.

Weigh-by-updates. The geometric chain ladder previously compared chains by
their *chunk count* (`chain.len()`). That is only a proxy for update count
while every chunk is the same size; once a backend regrades — re-melding chunks
so size and count decouple — a trickle of single-update chunks re-merges the
head chain on every insert. Weigh chains by summed updates instead. A chain is
immutable until merged, so its weight is computed once at push and cached
alongside it (`chains: Vec<(usize, Vec<Chunk>)>`). Neutral for the existing
`vec` backend (uniform chunk sizes make count and update-weight proportional);
the behaviour change only bites a regrading backend.

Refocus the Merger trait. The bundled `account() -> (records, size, capacity,
allocations)` splits into `len() -> usize` (update count — drives the ladder
and the logger's `records` field) and a defaulted `allocation() -> (size,
capacity, allocations)` for memory telemetry. The logger tuple is reassembled
verbatim via a private `record` helper, so `BatcherEvent`'s shape and the
emitted figures are unchanged.

NOTE: breaking change for out-of-tree `Merger` implementors (e.g. Materialize)
— rename `account` -> `len`, optionally override `allocation`. The only in-tree
impl (`vec::VecMerger`) is migrated here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@frankmcsherry frankmcsherry merged commit 7b1453b into TimelyDataflow:master-next Jun 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant