Skip to content

test(e2e): anchor multiple_blobs PXE on checkpointed tip to survive cold-start reorg#24446

Closed
spalladino wants to merge 1 commit into
merge-train/spartan-v5from
spl/multiple-blobs-checkpoint-reorg
Closed

test(e2e): anchor multiple_blobs PXE on checkpointed tip to survive cold-start reorg#24446
spalladino wants to merge 1 commit into
merge-train/spartan-v5from
spl/multiple-blobs-checkpoint-reorg

Conversation

@spalladino

Copy link
Copy Markdown
Contributor

Problem

single-node/block-building/multiple_blobs flakes in its beforeAll hook with:

Transaction 0x20d9...2607 was dropped. Reason: Tx dropped by P2P node

thrown from waitForTx on the setup deploy TestContract.deploy(wallet).send({ from }). Seen in CI run 1b22551cac4ccb41.

Root cause

Confirmed by comparing the failing run against a passing retry on the same commit:

  • setupBlockProducer defaults the PXE to syncChainTip: 'proposed', so each tx anchors its historical state on the pending (not-yet-checkpointed) chain tip.
  • In the failing run the node cold-started with L2 genesis already inside slot 3, whose checkpoint-proposal-received deadline had already elapsed at boot. The sequencer therefore never assembled a checkpoint proposal for that slot and never published checkpoint 1 to L1.
  • The archiver's orphan-prune (orphanPruneNoProposalTolerance) legitimately removed the uncheckpointed block once past its deadline, pruning the chain back to block 0.
  • The deploy tx's anchor block was pruned out from under it, so the p2p BlockHeaderTxValidator rejected the tx for referencing an unknown block header, the pool deleted it, and waitForTx observed the drop and threw.
  • The passing run cold-started aligned with the slot clock (genesis ~26s later in the cycle), advanced one block per slot, and published each checkpoint to L1 before building the next block, so the deploy landed cleanly and nothing was pruned.

This is a benign cold-start slot-clock race, not a product bug: orphan-pruning an uncheckpointed past-deadline block is correct, and the validator rejecting an unknown anchor is correct. The tx is dropped only because it was anchored on the volatile proposed tip.

Fix

Anchor this test's PXE on the checkpointed tip (pxeOpts: { syncChainTip: 'checkpointed' }), which is what the test used before the single-node consolidation refactor:

  • A checkpointed anchor block always has a proposed checkpoint and is never orphan-pruned, so the anchor is durable by construction rather than merely racing the prune deadline.
  • At cold start (before any checkpoint lands on L1) the checkpointed tip is genesis/block 0, which the orphan-prune never touches, so the setup deploy is safe.
  • multiple_blobs does not assert on freshly-proposed state — it sends its txs, waits for them via waitForTxs (which already waits for CHECKPOINTED status), reads the receipt's block, and checks that the block spans more than one blob. The checkpointed anchor is both sufficient and consistent with the test's own wait semantics.

The change is scoped to this test only; setupBlockProducer's 'proposed' default is left intact for the sibling block-building tests that intentionally exercise the freshly-proposed tip.

Note on PR #24428

This flake was flagged on PR #24428 but is not caused by it. #24428 only touches test-side code and none of multiple_blobs.test.ts, single-node/setup.ts, the PXE block synchronizer, or the archiver. The syncChainTip: 'proposed' default that triggers this race was introduced earlier by the single-node consolidation refactor (#24310), so this is a pre-existing v5-line flake.

Verification

  • The fixed test passes locally (block with 3 txs encoding to 2 blobs, as intended).
  • A deterministic local red is infeasible: the failure requires an unlucky node-boot-vs-genesis slot alignment that CI hits intermittently but that did not reproduce in 8 local runs of the original (faster/steadier local startup rarely boots past a slot's proposal deadline). The diagnosis rests on the failing-vs-passing CI log comparison plus the source paths above.

…old-start reorg

The multiple_blobs setup deploy in beforeAll intermittently fails with 'Tx dropped
by P2P node'. setupBlockProducer defaults the PXE to syncChainTip: 'proposed', so a
tx anchors on the pending (uncheckpointed) tip. When the node cold-starts inside a
slot whose checkpoint-proposal deadline has already elapsed, the sequencer never
proposes that slot's checkpoint to L1 and the archiver orphan-prunes the
uncheckpointed block. The deploy tx's anchor block vanishes in that prune, the p2p
block-header validator rejects it as referencing an unknown block header, and the
pool deletes it, so waitForTx throws.

Anchoring this test's PXE on the checkpointed tip makes the anchor durable by
construction (a checkpointed block always has a proposed checkpoint and is never
orphan-pruned), which matches the test's pre-consolidation behavior and its own
waitForTxs CHECKPOINTED wait semantics. The test only needs its txs mined and
checkpointed, not the freshly-proposed tip.

Fixes a flake seen in CI run 1b22551cac4ccb41.
@AztecBot

AztecBot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/fc39be67489c66ef�fc39be67489c66ef8;;�): yarn-project/scripts/run_test.sh p2p/src/client/test/p2p_client.integration_status_handshake.test.ts (24s) (code: 0) group:e2e-p2p-epoch-flakes

@spalladino spalladino added the S-do-not-merge Status: Do not merge this PR label Jul 1, 2026
@spalladino

Copy link
Copy Markdown
Contributor Author

Closing in favor of #24474

@spalladino spalladino closed this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-do-not-merge Status: Do not merge this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants