fix(sequencer): make lifecycle idempotent and restartable#24449
Open
spalladino wants to merge 3 commits into
Open
fix(sequencer): make lifecycle idempotent and restartable#24449spalladino wants to merge 3 commits into
spalladino wants to merge 3 commits into
Conversation
The sequencer's start/stop chain was not idempotent and had no working restart: a second start() news up a fresh RunningPromise and orphans the prior poll loop, and PublisherManager.start() after a stop() never cleared the publishers' interrupted flag, so a restarted sequencer would build and propose but every L1 publish would throw InterruptError. Changes: - Sequencer.start() is now idempotent (no-op if the poll loop is running), so a double start never orphans a second loop. - Sequencer.stop() is idempotent and now drains the fire-and-forget fallback sends (tryVoteAndPruneWhenCannotBuild / tryVoteWhenEscapeHatchOpen): each send's wrapper publisher is tracked, interrupted on stop (waking its own interruptible sleep, independent of the pooled L1TxUtils), and awaited. This removes the dangling-sleeper that a later restart would otherwise re-arm into a stale-slot publish once interrupted is cleared. - PublisherManager.start() calls restart() on every publisher and the funder to clear the interrupted flag (re-enabling L1 publishing after a stop), and only starts the funding loop when it is not already running. stop() is idempotent and drops the funding-loop reference. - SequencerClient.start() starts the publisher manager before the sequencer's poll loop so publishers are un-interrupted before the first publish; stop() remains a complete terminal teardown. stop() is preserved as a full drain and terminal shutdown; the added restart path is layered on top without weakening it. Tests: idempotent double-start/stop, restart clears the interrupted flag and re-enables publishing, full terminal stop tears everything down, and no stale fallback tx survives a stop across PublisherManager/Sequencer/RunningPromise.
Follow-up hardening from review of the lifecycle idempotency change: - Sequencer.start() now refuses while state is STOPPING. Without this, a start() landing after stop() set STOPPING but before it finished could allocate a fresh poll loop that the in-flight stop() would then mark STOPPED, orphaning a running loop under a stopped state. - Move the fallback-wrapper interrupt out of the pre-stopAll loop and into drainPendingFallbackSends(), which runs after runningPromise.stop(). The last in-flight work() iteration registers its fire-and-forget fallback send synchronously before returning, so only after the poll loop has stopped is pendingFallbackSends final; interrupting there catches that last send too. Interrupting before runningPromise.stop() would miss it, leaving a sleeper that a later restart could re-arm into a stale-slot publish. Adds tests for both: start-while-stopping is refused, and a fallback send registered after stop() began is still interrupted and drained.
…started A bare double start() re-ran loadStateAndResumeMonitoring(), which spawns a background monitor per pending nonce — and those monitors can send speed-up and cancellation txs, so a duplicate monitor for the same nonce could race the original. Guard start() with a `started` flag so a redundant start is a no-op; stop() clears the flag so a genuine restart-after-stop still reloads state and resumes monitoring of in-flight txs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Sequencer.start()andPublisherManager.start()were not idempotent: a secondstart()orphaned the previousRunningPromisepoll loop (leaving two loops racing), and a stopped-then-restarted sequencer never cleared the publishers'interruptedflag — so it would build and propose blocks but silently never publish to L1. This blocked merging sequential scenarios in tests (invalidate_block.parallelneeded a forced stop between scenarios) and is a prerequisite for stop→drain→warp→restart test tooling.Approach
start()/stop()idempotent acrossSequencerClient,Sequencer, andPublisherManager: a redundantstart()no-ops instead of orphaning a loop, andstart()whileSTOPPINGis refused to close the start-races-stop race.PublisherManager.start()now clears theinterruptedflag (via the previously-unusedrestart()methods on the publishers and funder), andSequencerClient.start()starts the publisher manager before the sequencer loop so publishers are un-interrupted before the first publish.stop()interrupts each wrapper's independent sleep and awaits it after the poll loop stops, so no dangling sleeper survives to be re-armed by a later restart.stop()remains a complete terminal drain for production teardown; it is merely also re-startable once fully resolved.Lifecycle calls are assumed to be serialized by the caller (all current callers are); this does not add a mutex for truly concurrent start/stop.