feat(sequencer): record L1 inclusion window and gas-price ladder in failed-tx store#24427
feat(sequencer): record L1 inclusion window and gas-price ladder in failed-tx store#24427aminsammara wants to merge 6 commits into
Conversation
Adds gas pricing, L1 fee environment, and slot timing data to the FailedL1Tx records so operators can diagnose underpriced L1 transactions. Captures: sent gas prices, pending pool p75 priority fees, next mined block inclusion thresholds, and time remaining until slot deadline. Also adds backup calls for send failures, on-chain reverts, and timeouts which were previously not recorded.
…ailed-tx store Enriches failed-L1-tx records so operators can diagnose underpricing: - Capture the fee data of the L1 inclusion window (the blocks the tx could have landed in for its L2 slot) via historical reads, once per slot, instead of polling for a single "next" block. - Retain the escalating gas-price ladder (initial send + speed-ups) on tx state and surface it on timeout via L1TxTimeoutError, so timeout records show what was actually paid across retries. Gated by the existing L1_TX_FAILED_STORE flag, so the tx-monitor hot path is unchanged when the store is disabled. - Give send-error/timeout records a per-attempt id so retries no longer overwrite each other; gate all capture on the resolved store so no fee-read RPC runs when the store is disabled. Adds unit and anvil integration tests for the window capture, the ladder capture/surfacing, and the sequencer writing the ladder into a real store.
…s for fee data The mined inclusion-window blocks already record the p75 and min-included priority fees of the txs that actually got in, which is the authoritative underpricing signal. The separate pending-pool snapshot only added a stale, post-deadline view (captured after the timeout) plus the single heaviest RPC call in the flow (pending block with full transactions). Remove captureFeeSnapshot and its eight gasInfo fields; windowBlocks is now the sole fee-environment source. Early send-errors whose window is not yet mined simply carry no window data.
…t compiles Multicall3.forward's return type gained a `state` field, but the existing forwardSpy mocks in sequencer-publisher and checkpoint_voter tests weren't updated — latent since the yarn-project build was previously blocked by ungenerated l1-artifacts. Regenerating the artifacts surfaced the errors.
| txConfigOverrides: gasConfigOverrides ?? {}, | ||
| sentAtL1Ts: now, | ||
| lastSentAtL1Ts: now, | ||
| gasPriceHistory: this.config.captureGasPriceHistory ? [baseState.gasPrice] : undefined, |
There was a problem hiding this comment.
For simplicity's sake, I'd remove the captureGasPriceHistory and save it always. It's pretty cheap to just keep track of this extra gasPrice field per attempt.
| return { | ||
| windowBlocks: windowBlocks.map(b => ({ | ||
| blockNumber: b.blockNumber.toString(), | ||
| timestamp: b.timestamp.toString(), | ||
| baseFeePerGas: b.baseFeePerGas.toString(), | ||
| p75PriorityFee: b.p75PriorityFee.toString(), | ||
| minIncludedPriorityFee: b.minIncludedPriorityFee.toString(), | ||
| minIncludedBlobPriorityFee: b.minIncludedBlobPriorityFee.toString(), | ||
| blockBlobsFull: b.blockBlobsFull, | ||
| includedBlobTxCount: b.includedBlobTxCount, | ||
| includedBlobCount: b.includedBlobCount, | ||
| })), | ||
| }; |
There was a problem hiding this comment.
Why do we need these mappings? Why are we casting everything to string here? If it's just for serialization purposes, we have a custom jsonStringify that knows how to deal with bigints properly.
| let l1BlockNumber = 0n; | ||
| try { | ||
| l1BlockNumber = await this.l1TxUtils.getBlockNumber(); | ||
| } catch { | ||
| // ignore - back up without the block number | ||
| } |
There was a problem hiding this comment.
| let l1BlockNumber = 0n; | |
| try { | |
| l1BlockNumber = await this.l1TxUtils.getBlockNumber(); | |
| } catch { | |
| // ignore - back up without the block number | |
| } | |
| const let l1BlockNumber = await this.l1TxUtils.getBlockNumber().catch(() => 0n); |
Sorry for the OCD
| const feeSummary = | ||
| opts?.sharedFeeSummary ?? | ||
| (opts?.captureFeeSummary ? await this.captureFeeEnvironment(opts.targetSlot) : undefined); |
There was a problem hiding this comment.
I'd wrap the captureFeeEnvironment in a try/catch (or catch.(() => undefined)), so an error while retrieving fee env does not break the saving of the failed tx.
| * When captureFeeSummary is true, captures L1 fee environment and waits for the next | ||
| * mined block (~12s) to record the definitive inclusion threshold before saving. |
There was a problem hiding this comment.
I can't find where we're waiting for the next block
| this.failedTxStore = createL1TxFailedStore(config.l1TxFailedStore, this.log); | ||
|
|
||
| // Only retain the gas-price ladder on publishers when we'll actually store failures with it. | ||
| this.captureGasPriceHistory = !!config.l1TxFailedStore; |
There was a problem hiding this comment.
I'd argue it's useful to capture gas price history (or the "fee env" as called here) even if you don't have a failed tx store. Outputting that data in logs under a warn (or error) can help diagnosing, even when there's no failed tx store configured.
| export async function captureWindowBlockFees( | ||
| client: ViemClient, | ||
| windowStartS: bigint, | ||
| windowEndS: bigint, | ||
| ): Promise<WindowBlockFees[]> { |
There was a problem hiding this comment.
IIRC there's an eth RPC call that returns gas price history. Can you check if we get enough data from it, so we don't have to download every single tx in the window to compute the stats? Maybe that call plus the block headers (ie includeTransactions: false) are good enough?
There was a problem hiding this comment.
Do you think we could move the logic for backing up failed txs and capturing fee env to a separate component? Could be a wrapper of the L1TxUtils, or a dependency, or something completely different, but outside the publisher itself. All this failed tx management is starting to pollute the publisher a lot. No need to do on this PR though.
Why
When a sequencer's
proposetransaction fails to land, the most common cause is underpricing during an L1 fee spike — but until now the failed-tx store recorded almost nothing to confirm or quantify that. Operators couldn't answer "was I underpriced, and by how much?" from the stored records.The subtlety is that underpriced txs don't revert — they time out. The publisher escalates the priority fee (a ladder of speed-ups at the same nonce) and eventually gives up. So diagnosing underpricing needs two things side by side: what you paid (the whole escalation ladder, since the intermediate txs are replaced and evicted, so they can't be recovered after the fact) and what it cost to get in (the fee distribution of the actual L1 blocks you were competing for).
This PR records both, so an operator (or a script over the store) can compare their bid against the real inclusion bar of the blocks in their slot's window.
What changed
captureWindowBlockFees, which reads (historically, once per slot) the L1 blocks whose timestamps fall in the target L2 slot's inclusion window and records per-blockbaseFeePerGas,p75PriorityFee,minIncludedPriorityFee, blob fees, andblockBlobsFull. Historical reads only — no waiting on the chain, so no RPC-amplification during a spike.L1TxUtilsretains the escalating prices (initial send + each speed-up) on tx state and surfaces them on timeout via a newL1TxTimeoutError(subclass ofTimeoutError, so existinginstanceofchecks are unaffected). The sequencer writes them to the record assentGasPriceLadder+attempts. Retention is gated by the existingL1_TX_FAILED_STOREflag, so the shared tx-monitor hot path is unchanged when the store is disabled — no new operator config.Sample record (
timeout/data-<id>.json)All wei values are decimal strings (gwei = ÷1e9).
{ "failureType": "timeout", "l1BlockNumber": "21050006", "error": { "message": "L1 transaction 0x… timed out", "name": "TimeoutError" }, "context": { "actions": ["propose"], "slot": 12345, "sender": "0xYourAttester…" }, "timing": { "targetL2Slot": 12345, "msUntilSlotDeadline": -8000 }, "gasInfo": { "attempts": 2, "gasLimit": "2100000", "nonce": 812, "sentGasPriceLadder": [ { "maxFeePerGas": "31000000000", "maxPriorityFeePerGas": "1000000000" }, { "maxFeePerGas": "33000000000", "maxPriorityFeePerGas": "2000000000" } ], "windowBlocks": [ { "blockNumber": "21050001", "baseFeePerGas": "29000000000", "minIncludedPriorityFee": "2500000000", "p75PriorityFee": "3200000000", "blockBlobsFull": false, "includedBlobTxCount": 3, "includedBlobCount": 5 }, { "blockNumber": "21050002", "baseFeePerGas": "29500000000", "minIncludedPriorityFee": "2400000000", "p75PriorityFee": "3100000000", "blockBlobsFull": false, "includedBlobTxCount": 2, "includedBlobCount": 4 } ] } }Reading it: the top of
sentGasPriceLadderwas 2 gwei; every window block'sminIncludedPriorityFeewas ~2.4–2.5 gwei (andp75~3.1–3.2 gwei), so even the escalated bid was below the minimum that got in anywhere in the slot → underpriced by ~0.5 gwei to clear, ~1.2 to be competitive.msUntilSlotDeadline: -8000confirms it missed the slot.blockBlobsFull: falserules out blob-space contention (if it weretrue, the loss would be blob space, not tip).Testing
L1TxTimeoutError instanceof TimeoutErrorcontract.sendRequeststimeout → ladder written to a real file-backed store → record read back off disk.Follow-up
Capturing the cancellation tx (hash + price) is a planned follow-up — it fires fire-and-forget, so it needs its own hook rather than a throw-time snapshot.