Skip to content

persist: skip GC rollup invariant assert under active GC#37237

Open
DAlperin wants to merge 1 commit into
MaterializeInc:mainfrom
DAlperin:dov/per-10-gc-active-assert
Open

persist: skip GC rollup invariant assert under active GC#37237
DAlperin wants to merge 1 commit into
MaterializeInc:mainfrom
DAlperin:dov/per-10-gc-active-assert

Conversation

@DAlperin

@DAlperin DAlperin commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

The soft-assert at the tail of gc_and_truncate ("earliest state fetched during GC did not have corresponding rollup") checks that states.state() references a rollup for the earliest fetched seqno. That premise holds only when states was built from fetch_all_live_states, where iterating to the end leaves states.state() at the genuine current state.

Under active GC (persist_gc_use_active_gc, on by default in CI for versions after v0.143.0-dev) GC instead fetches diffs only through seqno_since, so states.state() is the reconstructed state at seqno_since, not the current state. A rollup's entry is inserted into the rollups map by the add_rollup transition at a seqno strictly greater than the seqno it rolls up. When seqno_since precedes that insertion, the reconstructed state legitimately lacks its own rollup entry and the assert false-fires.

The check is a soft_assert_or_log!, so production only logs to Sentry — the impact is a recurring test blocker (Antithesis, nightly, Long Workload Replay), not a customer-facing crash. The PER-16 add_rollup fix (#36740) addressed an adjacent but distinct invariant and did not stop this.

Fix

Gate the check to the classic path, where it is sound. The active-GC path already validates the same invariant against fresh data when it resolves the rollup for the earliest seqno (the early-returns at the top of that branch), so the tripwire is preserved on the classic path for any genuine regression.

Tests

Adds the datadriven regression test tests/machine/regression_gc_active_rollup_assert, which reproduces the panic under active GC on the unfixed code and passes with the fix.

Fixes database-issues#10092, PER-10

🤖 Generated with Claude Code

The soft-assert at the tail of `gc_and_truncate` ("earliest state fetched
during GC did not have corresponding rollup") checks that
`states.state()` references a rollup for the earliest fetched seqno. That
premise holds only when `states` was built from `fetch_all_live_states`,
where iterating to the end leaves `states.state()` at the genuine current
state.

Under active GC (`persist_gc_use_active_gc`, on by default in CI for
versions after v0.143.0-dev) GC instead fetches diffs only through
`seqno_since`, so `states.state()` is the reconstructed state at
`seqno_since`, not the current state. A rollup's entry is inserted into
the rollups map by the `add_rollup` transition at a seqno strictly
greater than the seqno it rolls up. When `seqno_since` precedes that
insertion, the reconstructed state legitimately lacks its own rollup
entry and the assert false-fires, aborting the persist worker in
instrumented builds (Antithesis, nightly). The check is a
`soft_assert_or_log!`, so production only logs to Sentry; the impact is a
recurring test blocker, not a customer-facing crash.

Gate the check to the classic path, where it is sound. The active-GC path
already validates the same invariant against fresh data when it resolves
the rollup for the earliest seqno, so the tripwire is preserved on the
classic path for any genuine regression.

Adds the datadriven regression test
`tests/machine/regression_gc_active_rollup_assert`, which reproduces the
panic under active GC on the unfixed code.

Fixes database-issues#10092
@DAlperin DAlperin requested a review from a team as a code owner June 22, 2026 22:06
@DAlperin DAlperin requested a review from def- June 22, 2026 22:20

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly triggered: https://buildkite.com/materialize/nightly/builds/16856
No complaints if green Edit: LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants