Skip to content

fix(pxe): UNFINALIZED_TAGGING_INDEXES_WINDOW_LEN should be MAX_PRIVATE_LOGS_PER_TX + headroom, not exactly MAX_PRIVATE_LOGS_PER_TX#24437

Open
vezenovm wants to merge 10 commits into
merge-train/fairies-v5from
mv/test-tagging-window-20
Open

fix(pxe): UNFINALIZED_TAGGING_INDEXES_WINDOW_LEN should be MAX_PRIVATE_LOGS_PER_TX + headroom, not exactly MAX_PRIVATE_LOGS_PER_TX#24437
vezenovm wants to merge 10 commits into
merge-train/fairies-v5from
mv/test-tagging-window-20

Conversation

@vezenovm

@vezenovm vezenovm commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #24429. That PR widened UNFINALIZED_TAGGING_INDEXES_WINDOW_LEN from 20 to exactly MAX_PRIVATE_LOGS_PER_TX (64). This PR argues that's still not the right value for the scenario #24429 was trying to fix, and proposes MAX_PRIVATE_LOGS_PER_TX + 20.

Scope note (see "Known limitation" below): this fixes a real, CI-confirmed problem; multiple ordinary pending txs to the same counterparty stacking up before the first is mined. There is a separate, harder problem captured in F-783. Additional separate breakage follow-up on the sender-side when pending logs exceed the window F-784.

Investigation

MAX_PRIVATE_LOGS_PER_TX is a genuine, protocol-enforced bound for ordinary transaction patterns: noir-protocol-circuits constants.nr defines MAX_PRIVATE_LOGS_PER_TX directly as MAX_NOTE_HASHES_PER_TX. A fresh secret's very first ordinary tx can legitimately consume that many indexes before anything is finalized.

But exactly MAX_PRIVATE_LOGS_PER_TX leaves zero headroom for a second unconfirmed tx to the same counterparty once the first one is already near the max. finalized only advances once the sender's own node observes a tx as mined (see syncSenderTaggingIndexes), so any burst of sends to one counterparty before the first is mined can exhaust the window. This is exactly what happened in #24429's bench_build_block failure: #24387's handshake default added extra per-cold-chain overhead, and the bench builds up a backlog of unmined txs by design, so used ran past finalized (which stayed at 0) before window 20 could absorb it.

Confirmed live: reproduced on this branch at bare 20

To settle whether #24429's bump was actually necessary, we reset this branch's constant back to bare 20 and let bench_build_block run to completion in CI rather than relying on the reasoning above alone. It failed the same way (see http://ci.aztec-labs.com/4ed074af4b8ab633 from a4feea1, on the current merge-train/fairies-v5 tip:

ERROR: kv-store:lmdb-v2 pxe-0 Failed to commit transaction: Error: Highest used index 21 is further than window length from the highest finalized index 0.
            Tagging window length 20 is configured too low. Contact the Aztec team
            to increase it!

So this isn't a synthetic worst case: it's a real failure on real CI infrastructure, at index 21, i.e. just one past the window. Not even a large burst of txs.

Red/green evidence

Added sender_tagging_store.test.ts: "allows an ordinary pending tx to stack on a fresh secret already at the MAX_PRIVATE_LOGS_PER_TX floor" — one tx uses the full worst-case 64-log burst from a fresh secret, then a second, ordinary-sized pending tx to the same secret must still be storable before either is mined.

  • Red at bare 20: the first call in that test already throws, with zero other txs involved:
    Highest used index 63 is further than window length from the highest finalized index 0.
    Tagging window length 20 is configured too low.
    
  • Green at MAX_PRIVATE_LOGS_PER_TX + 20: same scenario passes.

This is the key point for review: the bench_build_block failure isn't just "risky under bursty load".20 is mathematically incapable of tolerating even one legitimate worst-case single ordinary tx, independent of what CI on this branch shows either way. CI passing at 20 only means our current test/bench suite doesn't happen to construct a single tx tagging more than 20 logs to one recipient; it was never testing the general safety of the constant, just this specific workload's exposure to it. The live CI failure above is the same underlying gap, just surfaced through a different, real workload instead of a targeted unit test.

We also added tests to pin the window's stacking semantics: a run of window+1 one-log pending txs to one secret fits, while the next tx throws.

Why + 20, not * 2 or another additive constant

Draft-only experiment: #24429 widened this to MAX_PRIVATE_LOGS_PER_TX
alongside adding an address-derived tagging-secret hook to the
client-flows bench. Reverting just the constant, on top of that hook,
isolates whether the widen was needed to pass CI or whether the hook
alone was sufficient.
@vezenovm vezenovm added ci-full Run all master checks. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure ci-draft Run CI on draft PRs. labels Jul 1, 2026
vezenovm added 2 commits July 1, 2026 12:39
…stant

The reverted comment (copied verbatim from before #24429) still said the
value must be larger than MAX_PRIVATE_LOGS_PER_TX while setting it to 20,
and cited e2e_l1_with_wall_time, which no longer exists (deleted in #24310).
Replace with an accurate description of the real bound and mark the value
explicitly as an unsafe test-only setting.
…OGS_PER_TX + 20

MAX_PRIVATE_LOGS_PER_TX is a hard floor (a single tx can tag that many logs
with one secret), but it alone leaves no headroom for multiple ordinary,
not-yet-mined txs to the same counterparty to stack up before the first is
observed as mined -- the scenario that tripped bench_build_block in #24429.
The +20 margin reuses the team's prior heuristic for that headroom instead
of letting it silently replace the floor, as the pre-#24429 value did.

Also drops the old comment's squashing justification for exceeding
MAX_PRIVATE_LOGS_PER_TX: reconcileTaggingIndexRangesAgainstSurvivingTags
trims the range to surviving indexes post-squash, it doesn't inflate it,
and MAX_NOTE_HASHES_PER_TX matches MAX_PRIVATE_LOGS_PER_TX today, so
squashing can't push a single tx's consumption past the floor.
@vezenovm vezenovm changed the title test(pxe): revert UNFINALIZED_TAGGING_INDEXES_WINDOW_LEN to 20 (draft, do not merge) fix(pxe): UNFINALIZED_TAGGING_INDEXES_WINDOW_LEN should be MAX_PRIVATE_LOGS_PER_TX + headroom, not exactly MAX_PRIVATE_LOGS_PER_TX Jul 1, 2026
Comment thread yarn-project/pxe/src/tagging/constants.ts
Comment thread yarn-project/pxe/src/tagging/constants.ts
vezenovm added 2 commits July 1, 2026 13:02
…ion test

Codex review of #24437 flagged that citing MAX_NOTE_HASHES_PER_TX as
"confirmation" of the MAX_PRIVATE_LOGS_PER_TX floor was weak evidence (the
match looked coincidental). It's not: MAX_PRIVATE_LOGS_PER_TX is defined as
MAX_NOTE_HASHES_PER_TX in noir-protocol-circuits' constants.nr, and more to
the point, PrivateAccumulatedData.private_logs is the tx-wide, pre-squash
accumulator sized to exactly MAX_PRIVATE_LOGS_PER_TX -- every private log
from every call in a tx lands there before reset/squashing ever runs, so no
tx can tag more than that many logs with one secret, squashed or not.

Also add a regression test locking in the actual scenario the +20 margin
exists for: a second, ordinary pending tx to the same secret must still be
storable after a first tx already used a full MAX_PRIVATE_LOGS_PER_TX-sized
burst from a fresh secret, before either is finalized.
vezenovm added 3 commits July 2, 2026 11:50
…ret ceiling

A squash-heavy tx can drive one secret's raw tag index arbitrarily past
MAX_PRIVATE_LOGS_PER_TX, since indexes are reserved at log-emission time,
before squashing is decided, and the kernel's reset loop isn't bounded by
that constant. No fixed window closes this; tracked separately in #24466.
@vezenovm vezenovm requested review from mverzilli, nchamo and nventuro July 2, 2026 17:02
@vezenovm vezenovm marked this pull request as ready for review July 2, 2026 17:02
@vezenovm vezenovm removed the ci-draft Run CI on draft PRs. label Jul 2, 2026
@vezenovm vezenovm requested a review from benesjan July 2, 2026 17:07
@AztecBot

AztecBot commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/1ca868bf43492353�1ca868bf434923538;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/single-node/proving/proof_fails.parallel.test.ts "does not allow submitting proof after epoch end" (118s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full Run all master checks. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants