Skip to content

feat: merge-train/spartan-v5#24454

Queued
AztecBot wants to merge 4 commits into
v5-nextfrom
merge-train/spartan-v5
Queued

feat: merge-train/spartan-v5#24454
AztecBot wants to merge 4 commits into
v5-nextfrom
merge-train/spartan-v5

Conversation

@AztecBot

@AztecBot AztecBot commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
fix(sequencer): recover fee-estimation caches from transient L1 read failures (#24426)
test(e2e): fix epoch-2 tip race in proof_fails after-epoch-end test (#24441)
END_COMMIT_OVERRIDE

…failures (#24426)

## Problem

In the docs-examples CI job (`docs/examples/bootstrap.sh execute`),
`example_swap` and `recursive_verification` exhausted all 5 retry
attempts on the same signature during `.send()` fee estimation:

```
Error: An unknown error occurred while executing the contract function "getTimestampForSlot".
  args: (54)   Details: L1 RPC request failed   code: -32702
```

Stack: `sendTx -> completeFeeOptions -> getMinFees ->
aztecNode.getPredictedMinFees / getCurrentMinFees` (over JSON-RPC to the
node). The failing reads are node-side, in
`FeeProviderImpl.computeCurrentMinFees` (`getTimestampForSlot` /
`getManaMinFeeAt`) and `FeePredictor.fetchState`.

Failing run: http://ci.aztec-labs.com/66ef70e6044f13bd (branch
`merge-train/spartan-v5`, commit `0e1827b945`).

## Root cause

`FeeProviderImpl.getCurrentMinFees` and `FeePredictor.getState` cache a
promise keyed on the L1 block number, refreshing only when the block
advances. When the cached computation rejects (a transient `L1 RPC
request failed` blip against the single 1-CPU anvil the docs-examples
job runs), the rejected promise is replayed on every subsequent call at
the same L1 block.

`getCurrentMinFees` was worse: it chained the next computation via
`currentMinFees.then(() => computeCurrentMinFees())`. Once
`currentMinFees` is rejected, `.then(onFulfilled)` propagates the
rejection and never invokes `computeCurrentMinFees` again — so the cache
stays poisoned permanently, regardless of L1 block advancement, until
the node restarts.

Because fee estimation fails before any L1 tx is submitted, no new L1
block is mined by that path, so a single blip permanently wedges
`.send()` on the node. The CI log confirms this: after one transient
failure on `getManaMinFeeAt`, `getTimestampForSlot(54)` failed
identically across all 5 retries of both examples — even though the node
kept mining L1 blocks (148-152) for the examples' own contract deploys
in the same second. This rules out a live node/anvil stall and pinpoints
the poisoned promise cache.

## Fix

- `getCurrentMinFees`: chain off the previous promise's settlement (via
a swallowing `.catch`) rather than its fulfillment, so a prior rejection
no longer short-circuits the new computation; and reset the cached L1
block number when the computation rejects so the next call recomputes at
the same block.
- `getState` (FeePredictor): reset the cached L1 block number on
rejection for the same reason.

Both are idempotent read paths, so recomputing on the next call is safe.

## Testing

- Added unit tests to `fee_provider.test.ts` and `fee_predictor.test.ts`
that fail (red) against the current code and pass (green) with the fix:
a transient rejection followed by a call at the same L1 block must
recompute rather than replay the cached rejection.
- Red/green verified locally: both new tests fail on the unpatched
source and the full `sequencer-client` fee suites pass (13 tests) with
the fix.
- A deterministic docs-examples repro was not feasible (it depends on a
transient L1 RPC blip under CI contention), so the tests target the
poisoning mechanism directly.
spalladino and others added 3 commits July 2, 2026 10:05
…24441)

Fixes a genuine timing flake in
`single-node/proving/proof_fails.parallel.test.ts` › `does not allow
submitting proof after epoch end`, observed in CI run `99ed463db48b9cf2`
on the v5 line (`Expected: 2, Received: 0` at the final epoch
assertion).

This test ran into a situation where the proposer `prune`d the unproven
chain without `propose`ing a new checkpoint, so the check for "has the
last checkpoint been mined in epoch 2" failed, since there was _no
checkpoint at all_ in the entire chain.

This fixes it so we wait until there's actually a checkpoint mined. In
parallel, I'm looking into _why_ the proposer decided to prune without
proposing.

@ludamad ludamad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot AztecBot added this pull request to the merge queue Jul 2, 2026
Any commits made after this event will not be merged.
@AztecBot

AztecBot commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants