perf(e2e): stopDrainWarpRestart primitive + pipeline_prune pilot#24450
Closed
spalladino wants to merge 1 commit into
Closed
perf(e2e): stopDrainWarpRestart primitive + pipeline_prune pilot#24450spalladino wants to merge 1 commit into
spalladino wants to merge 1 commit into
Conversation
…rune Add a reusable multi-node test primitive that warps the L1 clock while no sequencer is building: it waits every sequencer to IDLE, stop()s each (a full drain of the poll loop, tracked checkpoint job, and fire-and-forget fallback sends), runs a caller-supplied warp, and by default restarts each. This is only sound now that the sequencer lifecycle is idempotent + restartable and the publisher restart path clears the interrupted flag; the IDLE pre-wait plus the drain keep the warp from interrupting a live build and emitting a spurious "Sequencer was interrupted" fail-event. Apply it to pipeline_prune to collapse the ~126s dead gap where the chain just waited wall-clock for the L1 clock to roll past the orphan slot's checkpoint-proposal-received deadline so pruneOrphanProposedBlocks fires. After the orphan blocks are known-built on node[0], warp the shared TestDateProvider (which the archiver's prune reads) into the slot after the orphan one, well past the deadline. Sequencers are kept stopped until the prune is confirmed so no restarted proposer builds against the still-unpruned tip, then restarted for recovery. TX_COUNT is unchanged so the MBPS assertion still holds, and the pipelining assertion is unaffected since recovery building is normal pipelined building.
Contributor
Author
|
Closing in favor of #24475 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #24449 — depends on the idempotent/restartable sequencer lifecycle it adds. Rebase onto
merge-train/spartan-v5once #24449 merges.Context
Several multi-node e2e tests are dominated by wall-clock waits for the L1 clock to roll while live sequencers sit idle. Warping the shared clock under a running sequencer previously interrupted in-flight builds (the reorg / "Sequencer was interrupted" failures behind earlier revert attempts). Now that the lifecycle is idempotent and restartable (#24449), the sequencers can be cleanly paused and drained around a warp.
Approach
stopDrainWarpRestart(nodes, warpFn, opts?)onSingleNodeTestContext(inherited byMultiNodeTestContext): wait every sequencer toIDLE,stop()(full drain), runwarpFn()with nobody building, thenstart()again (default;restart: falseleaves them stopped). The warp target is the caller's responsibility, so the primitive is reusable across the remaining Phase-2 sites. Archivers, provers and the chain monitor keep running, so clock-driven effects (e.g. an orphan-block prune) still fire.pipeline_pruneto collapse its ~126s dead gap: once the orphan blocks are built and publishing is disabled, warp L1 two blocks into the slot after the orphaned one (past the no-proposal prune deadline) instead of waiting wall-clock, then restart the sequencers only after the prune is confirmed.TX_COUNTis unchanged, soassertMultipleBlocksPerSlotandassertProposerPipeliningstill hold.The
pipeline_prunespeedup is validated by this PR's CI run — the multi-node suite can't be run locally.