fix(avm): avoid data race on shared interaction selector writes#24456
Merged
Conversation
The interactions tracegen phase runs every lookup/permutation job via parallel_for (fill_trace_interactions). Lookups that share a destination-selector column can write the same (selector, row) cell from multiple threads, so the previous guarded read-modify-write (get(...) != 1 then set(...)) on that shared cell is a data race (undefined behavior). Add an opt-in use_atomic_limbs flag to TraceContainer::set that stores the field's four limbs with relaxed atomic writes (4 plain movq on x86-64, no lock -- unlike a whole-field std::atomic_ref<FF>, whose 32 bytes exceed the lock-free width). Use it for the shared selector write and drop the guard read: the write is now an unconditional idempotent atomic store of 1. Because every writer stores the same value, per-limb atomicity is sufficient (no torn value is possible), and the hot non-atomic set() path is unchanged.
jeanmon
approved these changes
Jul 2, 2026
| if (BB_UNLIKELY(use_atomic_limbs)) { | ||
| store_per_limb(shard->rows[offset], zero); | ||
| } else { | ||
| shard->rows[offset] = FF::zero(); |
Contributor
There was a problem hiding this comment.
You can use the constant zero.
Contributor
Author
There was a problem hiding this comment.
Hmm yeah, it probably would make sense to use either zero or FF::zero for both. I'll leave it like this because CI is already running but otherwise I'd probably use FF::zero for both (since it gives the compiler more information than a reference).
|
|
||
| // Set the fine grained inner selector if it's not already one. | ||
| if (LookupSettings::DST_SELECTOR != this->outer_dst_selector && | ||
| trace.get(LookupSettings::DST_SELECTOR, dst_row) != 1) { |
Contributor
There was a problem hiding this comment.
Did you remove the second boolean condition because it is not worth it performance-wise or because it would not be thread-safe?
Contributor
Author
There was a problem hiding this comment.
It's not safe even if it's a read.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was wrong
The interactions tracegen phase runs every lookup/permutation job concurrently:
AvmTraceGenHelper::fill_trace_interactionsconcatenates all builders' jobs and dispatches them withparallel_for. Lookups whose fine-grained destination selector is a shared column (DST_SELECTOR != outer_dst_selector) can resolve to the samedst_rowfrom different jobs, so multiple threads write the same(selector, row)cell at once.The previous code did a guarded read-modify-write on that shared cell:
Both the
getand the non-atomic 32-bytesetrace against the other threads' writes to the same cell — a data race, i.e. undefined behavior. In practice it produced the right value (every writer stores1), but it is still UB.The fix
Add an opt-in
use_atomic_limbsflag toTraceContainer::set. When set, the field's four 64-bit limbs are written with relaxed atomic stores — four plainmovqon x86-64, no lock and no libatomic call. (A whole-fieldstd::atomic_ref<FF>is not an option on the hot path: at 32 bytes it exceeds the hardware lock-free width and falls back to a locked libatomic call.)The shared selector write now uses it and drops the guard read:
The write is an unconditional, idempotent atomic store of
1. Because every concurrent writer stores the same value, per-limb atomicity is sufficient — no torn value is possible — so this is data-race-free without the cost of whole-field atomicity. The default (non-atomic)sethot path is untouched and keeps its sparse-column "zero = absent" fast path.Performance (this PR vs baseline)
Mega bulk AVM tx, full proving,
HARDWARE_CONCURRENCY=16. Tracegen stage timings, median of runs (baseline n=3, this PR n=6):next)No measurable cost — the safety fix is free. The per-limb atomic only fires on the shared selector writes in the interactions stage; the traces stage is byte-for-byte the same hot path as baseline.
Fixes https://linear.app/aztec-labs/issue/AVM-276/interactions-tracegen-does-ub-write-to-destination-selector