perf(e2e): warp proven-checkpoint waits, tighten in-process polling, and instrument setup spans by spalladino · Pull Request #24452 · AztecProtocol/aztec-packages

spalladino · 2026-07-02T03:06:42Z

Motivation

Span-instrumented full CI runs (since #24407 landed) now show where the remaining e2e wall-clock goes: a large uninstrumented setup layer, a cluster of proven-checkpoint/epoch waits that burn real time while nothing is being built, and a 1s in-process poll cadence that adds ~0.5-0.9s of dead sleep to nearly every awaited tx. This PR takes the low-risk warp folds and polling wins that are safe today, and adds span instrumentation to the setup layer so the next round can attack the ~5,000s of currently-invisible beforeAll work. Everything multi-node is validated on CI (hence ci-no-fail-fast); the last commit is a deliberate slot-cut experiment kept isolated so it can be reverted on its own.

Changes

poll in-process nodes at 250ms instead of 1s — TestWallet now defaults its send().wait() poll interval to 0.25s (in-process nodes reach CHECKPOINTED synchronously under automine and cheaply otherwise), with the spartan/worker sites explicitly restored to the 1s default since they talk to remote JSON-RPC nodes. The e2e-local wait helpers (wait_helpers.ts, waitForProvenChain, gas-portal block advance, L1->L2 message poll) drop from 1s to 0.25s. Removes ~0.5-0.9s of dead sleep per awaited tx across ~600 tests. Production aztec.js/wallet-sdk defaults are untouched.
warp past the epoch boundary in waitForProvenCheckpoint — after the multi-node block-production fixture stops its sequencers, warp the L1 clock one epoch forward (forward-only, skipped if already proven) so the epoch closes and the fake prover can prove+submit without waiting the epoch out in real time. Targets ~716s of suite-summed wait:proven-checkpoint across proposed_chain (~430s), deploy_and_call_ordering (~143s), cross_chain_messages (~143s), plus blob_promotion.
warp epoch waits in multi_proof and upload_failed_proof — replace waitUntilEpochStarts with warpToEpochStart in these two 12s-slot proving tests (both passed under warp in the round-1 CI sweep). proof_fails is deliberately left untouched.
register-only TestContract in automine pxe test — the test only calls the noinitcheck private emit_nullifier, so register the contract instead of deploying it, dropping a deployment tx and its checkpoint cycle.
compute genesis values on an ephemeral world state — generateGenesisValues used a full fsync-on NativeWorldStateService.tmp per e2e container just to read one tree root; switch to the fsync-off ephemeral backend. Adds a unit test asserting tmp and ephemeral produce identical archive roots for a funded-accounts genesis with non-zero timestamp (this path is consensus-critical — CLI deploy paths compute the on-chain genesis root through it).
instrument setup-layer deploys and mints with spans — wrap the top-offender setup helpers (fees harness token/FPC deploys + mints + fee-juice bridge, cross-chain token/bridge deploys + mints, shared auth-registry publish, gas-portal bridge) in testSpan under the test(e2e): instrument common spans for wall-clock tracking #24407 taxonomy (deploy:*, tx:mint, setup:bridge, setup:auth-registry). Zero behavior change (testSpan is a passthrough without TEST_TIMING_FILE); this is the data source for round 4's attack on the ~5,000s of uninstrumented beforeAll work.
cut multi_validator_node slots 36s -> 16s (experiment, last commit) — lower aztecSlotDuration 36->16 and blockDurationMs 6000->2000 together (eth stays 8). Small expected saving; kept as the final isolated commit so it can be reverted alone if CI shows committee/attestation trouble on this file.

One planned item was dropped: an opt-in warp for ChainMonitor.waitUntilL2Slot. All three candidate call sites turned out to cover deliberate real-time building (live-sequencer coordination, inactivity accumulation across an epoch, and the proof-boundary critical window), so the opt-in API would have had no safe callers.

Verification

Locally: full yarn build, yarn format --check, and yarn lint on the touched packages all pass. The new world-state genesis-equivalence unit test passes (tmp and ephemeral roots identical). The automine/pxe.test.ts e2e passes as a smoke test for the register-only change and the 250ms polling. Everything multi-node (the warp folds, the timing cut) is validated on CI. Note the final commit is a deliberate slot-cut experiment that can be reverted in isolation.

Measured impact

Full green CI run of this PR (9b4cc967, CI 1782961631439529) vs the base-proxy full run (d160265b, CI 1782938936852228 — the branch point plus one unrelated one-file test change). Identical test populations: 2051 rows, all passed, in both runs. Sums are across parallel processes, not wall-clock (methodology in the Linear "Times tracking" doc).

Bucket	Base	This PR	Δ
Overall	7h 26m 03s	6h 36m 20s	−49m 43s (−11.1%)
Setup (before-hooks)	2h 14m 31s	1h 45m 44s	−21.4%
— of which setup.ts	42m 53s	34m 13s	−20.2%
Body	5h 05m 14s	4h 45m 03s	−6.6%
Teardown	6m 17s	5m 33s	−11.6%

By mechanism:

Proven-checkpoint warp: wait:proven-checkpoint 14m 12s → 3m 32s (−75%), worst single wait 215s → 79s. Suite deltas match span deltas ~1:1 — proposed_chain −67%, deploy_and_call_ordering −45%, cross_chain_messages −21%. Hard attribution.
Epoch warps: multi_proof −52% (wait:epoch 92s → 40s). upload_failed_proof's warp also worked (35s → 20s) but was masked in the suite total by one-shot proving noise in that run.
250ms polling + ephemeral genesis: ~30–35 min spread across the whole suite — 141 of 152 suites improved; every setup:env:* span dropped ~16–50% with identical counts (same work, less waiting).
Slot-cut experiment: multi_validator_node 103s → 84s (−18.6%) and passed CI — the experiment survives.
Register-only pxe test: 18.2s → 10.1s.
New setup spans: ~44 min/run of previously untagged setup time is now attributable (setup:auth-registry 16m, setup:bridge 10m, tx:mint 8m, deploy:token 7m, deploy:fpc 2m) — the target list for round 4.

11 suites regressed (~8 min total vs ~58 min of improvements), all in untagged real-time slashing/proving bodies whose tagged waits are unchanged — consistent with run-to-run variance, not PR effects (per-suite noise floor between same-day runs is median ~11s / p90 ~52s).

spalladino added 7 commits July 1, 2026 23:49

perf(e2e): poll in-process nodes at 250ms instead of 1s

edd3bff

perf(e2e): warp past the epoch boundary in waitForProvenCheckpoint

05feb39

perf(e2e): warp epoch waits in multi_proof and upload_failed_proof

3d92b11

perf(e2e): register-only TestContract in automine pxe test

b9c7e95

perf(world-state): compute genesis values on an ephemeral world state

5ad147f

test(e2e): instrument setup-layer deploys and mints with spans

5079ac6

perf(e2e): cut multi_validator_node slots 36s -> 16s

9b4cc96

spalladino added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure S-do-not-merge Status: Do not merge this PR labels Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(e2e): warp proven-checkpoint waits, tighten in-process polling, and instrument setup spans#24452

perf(e2e): warp proven-checkpoint waits, tighten in-process polling, and instrument setup spans#24452
spalladino wants to merge 7 commits into
merge-train/spartan-v5from
spl/e2e-speed-up-3

spalladino commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

spalladino commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Verification

Measured impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spalladino commented Jul 2, 2026 •

edited

Loading