fix(ci): raise avm check-circuit per-tx timeout to stop e2e_multiple_blobs flakes#24445
Draft
AztecBot wants to merge 1 commit into
Draft
fix(ci): raise avm check-circuit per-tx timeout to stop e2e_multiple_blobs flakes#24445AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
AVM Circuit Inputs Collection and Checkworkflow (.github/workflows/avm-circuit-inputs.yml) has been failing intermittently onnext, with theavm-check-circuitjob exiting124. Recent runs onnextalternate pass/fail on unrelated commits:feat: merge-train/spartan (#24431)feat: add generated aztec-vm-sim package setupfeat: alert if L1 nodes become unhealthy (#24396)feat: deploy new eth nodesRoot cause
avm_check_circuit_cmdsinyarn-project/end-to-end/bootstrap.shruns every dumped AVM input under a per-itemtimeout(exec_testwraps each command intimeout -v $TIMEOUT), previously set toTIMEOUT=30s. The jobs run viaparallelizewith--halt now,fail=1, so a single item hitting its timeout returns exit124and fails the entire job.In both failing runs the offending item was an
e2e_multiple_blobstransaction — the heaviest AVM circuit in the input set — killed at exactly 32s against the 30s budget:avm_cc_e2e_multiple_blobs_0x019c3966—(32s) (code: 124)avm_cc_e2e_multiple_blobs_0x1e2ad133—(32s) (code: 124)Every other input passes in ~1–3s. These blob-heavy txs sit right at the 30s boundary and, under the CPU contention of the wide parallel run (max 32 jobs, each
taskset-ed to all cores), tip past 30s and get SIGTERM'd. This is a boundary timeout flake, not a constraint violation —check_circuitwas still running (no failed assertion) when killed. The existing code comment already anticipated this exact failure mode.Fix
Raise the per-item timeout from
30sto120s, giving the heavy-but-finite runs comfortable headroom over their observed ~32s worst case. A genuinely stuck circuit is still caught, just later. This is a nightly/nextsafety-net job, so the higher ceiling has no meaningful cost. The stale comment (claiming all txs are "relatively small") is corrected to describe the real behavior.Testing
Reproducing the 32s run locally isn't feasible — it requires the EC2
bb-avmbuild and the S3-cached dumped inputs. The diagnosis rests on two independent CI runs failing identically one2e_multiple_blobsat the 30s boundary on unrelated commits. The change is a one-line timeout bump plus a comment;bash -nparse is unaffected (the extglob warnings it prints are pre-existing, from the test-glob lines).Created by claudebox · group:
slackbot