fix(avm): avoid data race on shared interaction selector writes (port #24456 to v5-next) by fcarreiro · Pull Request #24457 · AztecProtocol/aztec-packages

fcarreiro · 2026-07-02T11:44:48Z

Ports #24456 to v5-next.

What was wrong

The interactions tracegen phase runs every lookup/permutation job concurrently: AvmTraceGenHelper::fill_trace_interactions concatenates all builders' jobs and dispatches them with parallel_for. Lookups whose fine-grained destination selector is a shared column (DST_SELECTOR != outer_dst_selector) can resolve to the same dst_row from different jobs, so multiple threads write the same (selector, row) cell at once.

The previous code did a guarded read-modify-write on that shared cell:

if (DST_SELECTOR != outer_dst_selector && trace.get(DST_SELECTOR, dst_row) != 1) {
    trace.set(DST_SELECTOR, dst_row, 1);
}

Both the get and the non-atomic 32-byte set race against the other threads' writes to the same cell — a data race, i.e. undefined behavior. In practice it produced the right value (every writer stores 1), but it is still UB.

The fix

Add an opt-in use_atomic_limbs flag to TraceContainer::set. When set, the field's four 64-bit limbs are written with relaxed atomic stores — four plain movq on x86-64, no lock and no libatomic call. (A whole-field std::atomic_ref<FF> is not an option on the hot path: at 32 bytes it exceeds the hardware lock-free width and falls back to a locked libatomic call.)

The shared selector write now uses it and drops the guard read:

if (DST_SELECTOR != outer_dst_selector) {
    trace.set(DST_SELECTOR, dst_row, 1, /*use_atomic_limbs=*/true);
}

The write is an unconditional, idempotent atomic store of 1. Because every concurrent writer stores the same value, per-limb atomicity is sufficient — no torn value is possible — so this is data-race-free without the cost of whole-field atomicity. The default (non-atomic) set hot path is untouched and keeps its sparse-column "zero = absent" fast path.

Performance (this PR vs baseline)

Mega bulk AVM tx, full proving, HARDWARE_CONCURRENCY=16. Tracegen stage timings, median of runs (baseline n=3, this PR n=6):

Stage	baseline	this PR
tracegen traces	1,536 ms	1,537 ms
tracegen interactions	350 ms	329 ms
tracegen all	1,917 ms	1,936 ms

No measurable cost — the safety fix is free. The per-limb atomic only fires on the shared selector writes in the interactions stage; the traces stage is byte-for-byte the same hot path as baseline.

The interactions tracegen phase runs every lookup/permutation job via parallel_for (fill_trace_interactions). Lookups that share a destination-selector column can write the same (selector, row) cell from multiple threads, so the previous guarded read-modify-write (get(...) != 1 then set(...)) on that shared cell is a data race (undefined behavior). Add an opt-in use_atomic_limbs flag to TraceContainer::set that stores the field's four limbs with relaxed atomic writes (4 plain movq on x86-64, no lock -- unlike a whole-field std::atomic_ref<FF>, whose 32 bytes exceed the lock-free width). Use it for the shared selector write and drop the guard read: the write is now an unconditional idempotent atomic store of 1. Because every writer stores the same value, per-limb atomicity is sufficient (no torn value is possible), and the hot non-atomic set() path is unchanged.

AztecBot · 2026-07-02T16:45:48Z

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/812cb13379eef261�812cb13379eef2618;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/single-node/proving/proof_fails.parallel.test.ts "does not allow submitting proof after epoch end" (117s) (code: 0) group:e2e-p2p-epoch-flakes

fcarreiro requested review from IlyasRidhuan, MirandaWood and jeanmon as code owners July 2, 2026 11:44

fcarreiro enabled auto-merge July 2, 2026 11:46

jeanmon approved these changes Jul 2, 2026

View reviewed changes

fcarreiro added this pull request to the merge queue Jul 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Jul 2, 2026

fcarreiro added this pull request to the merge queue Jul 2, 2026

Merged via the queue into v5-next with commit 413a8d2 Jul 2, 2026
14 checks passed

fcarreiro deleted the fc/avm-safer-interactions-v5 branch July 2, 2026 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(avm): avoid data race on shared interaction selector writes (port #24456 to v5-next)#24457

fix(avm): avoid data race on shared interaction selector writes (port #24456 to v5-next)#24457
fcarreiro merged 1 commit into
v5-nextfrom
fc/avm-safer-interactions-v5

fcarreiro commented Jul 2, 2026

Uh oh!

Uh oh!

AztecBot commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fcarreiro commented Jul 2, 2026

What was wrong

The fix

Performance (this PR vs baseline)

Uh oh!

Uh oh!

AztecBot commented Jul 2, 2026

Flakey Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants