Skip to content

fix(ci): stop merge-train PRs running the same commit multiple times#24433

Open
charlielye wants to merge 1 commit into
nextfrom
ci3-merge-train-dedupe
Open

fix(ci): stop merge-train PRs running the same commit multiple times#24433
charlielye wants to merge 1 commit into
nextfrom
ci3-merge-train-dedupe

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Problem

A single merge-train commit spawned three concurrent CI runs (observed on PR #24431 for commit b7940093a5 — three pull_request runs, same SHA, same second), each launching its own EC2 box with the identical name.

Root cause

Merge-train PRs are intentionally exempt from CI cancellation so each distinct accumulated train commit gets tested. But the concurrency group keyed on github.run_id:

group: ci3-${{ (startsWith(github.event.pull_request.head.ref, 'merge-train/') && github.run_id) || ... }}

github.run_id is unique per run, so cancel-in-progress never collapsed anything — including duplicate pull_request events for the same commit. When a sub-PR merges into merge-train/spartan, the bot fires a burst at one SHA: a synchronize plus re-applying the ci-no-squash and ci-full-no-test-cache labels (the labeled type is enabled). Each became its own uncancellable run.

Those runs then all built the same instance name (aztec-packages_merge-train_spartan_amd64_x-full-no-test-cache) and raced bootstrap_ec2's same-name reap (a check-then-launch race), so all three boxes survived.

Fix

  • ci3.yml — key the merge-train concurrency group on github.event.pull_request.head.sha instead of github.run_id. Same-commit duplicate events now collapse (cancel-in-progress); distinct commits still run concurrently. Non-merge-train PRs, merge_group, and push are untouched — the changed operand is only selected when the head ref is merge-train/* (for everything else startsWith(...) is false and the expression falls through to merge_group.head_ref || ref_name exactly as before).
  • bootstrap_ec2 — append the commit sha to merge-train instance names, so the distinct commits that do run concurrently get their own boxes instead of colliding / reaping each other. ci.sh's connect helpers match the Name tag by substring, so the sha-free base name from aws_instance_name still resolves them.

Verification

  • Merge a sub-PR into merge-train/spartan; confirm the synchronize + two labeled events collapse to one in-progress ci3 run for that SHA and exactly one instance is requested.
  • Non-regression: push two commits quickly to a normal PR; the older run is cancelled, only the latest proceeds (unchanged).

Merge-train PRs are deliberately exempt from CI cancellation so each distinct
train commit is tested. But the concurrency group keyed on github.run_id (unique
per run), which never collapses anything — so duplicate pull_request events for
the SAME commit (synchronize + the bot re-applying ci-no-squash /
ci-full-no-test-cache labels) each spawned their own run. Observed: 3 identical
CI runs for one commit, each launching an EC2 box with the same name, racing
bootstrap_ec2's same-name reap (a check-then-launch race) so all 3 survived.

- ci3.yml: key the merge-train concurrency group on head.sha, not run_id, so
  same-commit duplicate events collapse while distinct commits still run
  concurrently. Non-merge-train / merge_group / push are untouched (the changed
  operand is only selected for merge-train heads).
- bootstrap_ec2: append the commit sha to merge-train instance names so the
  distinct commits that DO run concurrently get their own boxes instead of
  colliding / reaping each other. Connect helpers match the Name tag by
  substring, so the sha-free base name still resolves them.
@charlielye charlielye requested a review from ludamad July 1, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants