Skip to content

perf(e2e): stopDrainWarpRestart primitive + pipeline_prune pilot#24450

Closed
spalladino wants to merge 1 commit into
spl/sequencer-restart-idempotencyfrom
spl/e2e-stop-drain-warp-restart
Closed

perf(e2e): stopDrainWarpRestart primitive + pipeline_prune pilot#24450
spalladino wants to merge 1 commit into
spl/sequencer-restart-idempotencyfrom
spl/e2e-stop-drain-warp-restart

Conversation

@spalladino

Copy link
Copy Markdown
Contributor

Stacked on #24449 — depends on the idempotent/restartable sequencer lifecycle it adds. Rebase onto merge-train/spartan-v5 once #24449 merges.

Context

Several multi-node e2e tests are dominated by wall-clock waits for the L1 clock to roll while live sequencers sit idle. Warping the shared clock under a running sequencer previously interrupted in-flight builds (the reorg / "Sequencer was interrupted" failures behind earlier revert attempts). Now that the lifecycle is idempotent and restartable (#24449), the sequencers can be cleanly paused and drained around a warp.

Approach

  • Add stopDrainWarpRestart(nodes, warpFn, opts?) on SingleNodeTestContext (inherited by MultiNodeTestContext): wait every sequencer to IDLE, stop() (full drain), run warpFn() with nobody building, then start() again (default; restart: false leaves them stopped). The warp target is the caller's responsibility, so the primitive is reusable across the remaining Phase-2 sites. Archivers, provers and the chain monitor keep running, so clock-driven effects (e.g. an orphan-block prune) still fire.
  • Apply it to pipeline_prune to collapse its ~126s dead gap: once the orphan blocks are built and publishing is disabled, warp L1 two blocks into the slot after the orphaned one (past the no-proposal prune deadline) instead of waiting wall-clock, then restart the sequencers only after the prune is confirmed. TX_COUNT is unchanged, so assertMultipleBlocksPerSlot and assertProposerPipelining still hold.

The pipeline_prune speedup is validated by this PR's CI run — the multi-node suite can't be run locally.

…rune

Add a reusable multi-node test primitive that warps the L1 clock while no
sequencer is building: it waits every sequencer to IDLE, stop()s each (a full
drain of the poll loop, tracked checkpoint job, and fire-and-forget fallback
sends), runs a caller-supplied warp, and by default restarts each. This is only
sound now that the sequencer lifecycle is idempotent + restartable and the
publisher restart path clears the interrupted flag; the IDLE pre-wait plus the
drain keep the warp from interrupting a live build and emitting a spurious
"Sequencer was interrupted" fail-event.

Apply it to pipeline_prune to collapse the ~126s dead gap where the chain just
waited wall-clock for the L1 clock to roll past the orphan slot's
checkpoint-proposal-received deadline so pruneOrphanProposedBlocks fires. After
the orphan blocks are known-built on node[0], warp the shared TestDateProvider
(which the archiver's prune reads) into the slot after the orphan one, well past
the deadline. Sequencers are kept stopped until the prune is confirmed so no
restarted proposer builds against the still-unpruned tip, then restarted for
recovery. TX_COUNT is unchanged so the MBPS assertion still holds, and the
pipelining assertion is unaffected since recovery building is normal pipelined
building.
@spalladino

Copy link
Copy Markdown
Contributor Author

Closing in favor of #24475

@spalladino spalladino closed this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-do-not-merge Status: Do not merge this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant