fix(sequencer): make lifecycle idempotent and restartable by spalladino · Pull Request #24449 · AztecProtocol/aztec-packages

spalladino · 2026-07-01T21:08:42Z

Context

Sequencer.start() and PublisherManager.start() were not idempotent: a second start() orphaned the previous RunningPromise poll loop (leaving two loops racing), and a stopped-then-restarted sequencer never cleared the publishers' interrupted flag — so it would build and propose blocks but silently never publish to L1. This blocked merging sequential scenarios in tests (invalidate_block.parallel needed a forced stop between scenarios) and is a prerequisite for stop→drain→warp→restart test tooling.

Approach

Make start()/stop() idempotent across SequencerClient, Sequencer, and PublisherManager: a redundant start() no-ops instead of orphaning a loop, and start() while STOPPING is refused to close the start-races-stop race.
Add a working restart path: PublisherManager.start() now clears the interrupted flag (via the previously-unused restart() methods on the publishers and funder), and SequencerClient.start() starts the publisher manager before the sequencer loop so publishers are un-interrupted before the first publish.
Prevent a stale-slot tx across restart: the two fire-and-forget fallback sends (vote-and-prune / escape-hatch) are tracked, and stop() interrupts each wrapper's independent sleep and awaits it after the poll loop stops, so no dangling sleeper survives to be re-armed by a later restart.
stop() remains a complete terminal drain for production teardown; it is merely also re-startable once fully resolved.

Lifecycle calls are assumed to be serialized by the caller (all current callers are); this does not add a mutex for truly concurrent start/stop.

The sequencer's start/stop chain was not idempotent and had no working restart: a second start() news up a fresh RunningPromise and orphans the prior poll loop, and PublisherManager.start() after a stop() never cleared the publishers' interrupted flag, so a restarted sequencer would build and propose but every L1 publish would throw InterruptError. Changes: - Sequencer.start() is now idempotent (no-op if the poll loop is running), so a double start never orphans a second loop. - Sequencer.stop() is idempotent and now drains the fire-and-forget fallback sends (tryVoteAndPruneWhenCannotBuild / tryVoteWhenEscapeHatchOpen): each send's wrapper publisher is tracked, interrupted on stop (waking its own interruptible sleep, independent of the pooled L1TxUtils), and awaited. This removes the dangling-sleeper that a later restart would otherwise re-arm into a stale-slot publish once interrupted is cleared. - PublisherManager.start() calls restart() on every publisher and the funder to clear the interrupted flag (re-enabling L1 publishing after a stop), and only starts the funding loop when it is not already running. stop() is idempotent and drops the funding-loop reference. - SequencerClient.start() starts the publisher manager before the sequencer's poll loop so publishers are un-interrupted before the first publish; stop() remains a complete terminal teardown. stop() is preserved as a full drain and terminal shutdown; the added restart path is layered on top without weakening it. Tests: idempotent double-start/stop, restart clears the interrupted flag and re-enables publishing, full terminal stop tears everything down, and no stale fallback tx survives a stop across PublisherManager/Sequencer/RunningPromise.

Follow-up hardening from review of the lifecycle idempotency change: - Sequencer.start() now refuses while state is STOPPING. Without this, a start() landing after stop() set STOPPING but before it finished could allocate a fresh poll loop that the in-flight stop() would then mark STOPPED, orphaning a running loop under a stopped state. - Move the fallback-wrapper interrupt out of the pre-stopAll loop and into drainPendingFallbackSends(), which runs after runningPromise.stop(). The last in-flight work() iteration registers its fire-and-forget fallback send synchronously before returning, so only after the poll loop has stopped is pendingFallbackSends final; interrupting there catches that last send too. Interrupting before runningPromise.stop() would miss it, leaving a sleeper that a later restart could re-arm into a stale-slot publish. Adds tests for both: start-while-stopping is refused, and a fallback send registered after stop() began is still interrupted and drained.

…started A bare double start() re-ran loadStateAndResumeMonitoring(), which spawns a background monitor per pending nonce — and those monitors can send speed-up and cancellation txs, so a duplicate monitor for the same nonce could race the original. Guard start() with a `started` flag so a redundant start is a no-op; stop() clears the flag so a genuine restart-after-stop still reloads state and resumes monitoring of in-flight txs.

spalladino added 3 commits July 1, 2026 14:49

spalladino mentioned this pull request Jul 1, 2026

perf(e2e): stopDrainWarpRestart primitive + pipeline_prune pilot #24450

Open

spalladino added the S-do-not-merge Status: Do not merge this PR label Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sequencer): make lifecycle idempotent and restartable#24449

fix(sequencer): make lifecycle idempotent and restartable#24449
spalladino wants to merge 3 commits into
merge-train/spartan-v5from
spl/sequencer-restart-idempotency

spalladino commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

spalladino commented Jul 1, 2026

Context

Approach

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant