Skip to content

fix(ci): raise avm check-circuit per-tx timeout to stop e2e_multiple_blobs flakes#24445

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout-bump
Draft

fix(ci): raise avm check-circuit per-tx timeout to stop e2e_multiple_blobs flakes#24445
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout-bump

Conversation

@AztecBot

@AztecBot AztecBot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Problem

The AVM Circuit Inputs Collection and Check workflow (.github/workflows/avm-circuit-inputs.yml) has been failing intermittently on next, with the avm-check-circuit job exiting 124. Recent runs on next alternate pass/fail on unrelated commits:

Run Commit Result
1853 feat: merge-train/spartan (#24431) ❌ failure
1852 feat: add generated aztec-vm-sim package setup ✅ success
1851 feat: alert if L1 nodes become unhealthy (#24396) ❌ failure
1850 feat: deploy new eth nodes ✅ success

Root cause

avm_check_circuit_cmds in yarn-project/end-to-end/bootstrap.sh runs every dumped AVM input under a per-item timeout (exec_test wraps each command in timeout -v $TIMEOUT), previously set to TIMEOUT=30s. The jobs run via parallelize with --halt now,fail=1, so a single item hitting its timeout returns exit 124 and fails the entire job.

In both failing runs the offending item was an e2e_multiple_blobs transaction — the heaviest AVM circuit in the input set — killed at exactly 32s against the 30s budget:

  • Run 1853: avm_cc_e2e_multiple_blobs_0x019c3966(32s) (code: 124)
  • Run 1851: avm_cc_e2e_multiple_blobs_0x1e2ad133(32s) (code: 124)

Every other input passes in ~1–3s. These blob-heavy txs sit right at the 30s boundary and, under the CPU contention of the wide parallel run (max 32 jobs, each taskset-ed to all cores), tip past 30s and get SIGTERM'd. This is a boundary timeout flake, not a constraint violation — check_circuit was still running (no failed assertion) when killed. The existing code comment already anticipated this exact failure mode.

Fix

Raise the per-item timeout from 30s to 120s, giving the heavy-but-finite runs comfortable headroom over their observed ~32s worst case. A genuinely stuck circuit is still caught, just later. This is a nightly/next safety-net job, so the higher ceiling has no meaningful cost. The stale comment (claiming all txs are "relatively small") is corrected to describe the real behavior.

Testing

Reproducing the 32s run locally isn't feasible — it requires the EC2 bb-avm build and the S3-cached dumped inputs. The diagnosis rests on two independent CI runs failing identically on e2e_multiple_blobs at the 30s boundary on unrelated commits. The change is a one-line timeout bump plus a comment; bash -n parse is unaffected (the extglob warnings it prints are pre-existing, from the test-glob lines).


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant