Skip to content

[MINOR][CI] Extend hard guard that dumps stacks on stalled test forks#2500

Open
janniklinde wants to merge 2 commits into
apache:mainfrom
janniklinde:ci/component-c-debug
Open

[MINOR][CI] Extend hard guard that dumps stacks on stalled test forks#2500
janniklinde wants to merge 2 commits into
apache:mainfrom
janniklinde:ci/component-c-debug

Conversation

@janniklinde

Copy link
Copy Markdown
Contributor

No description provided.

Baunsgaard and others added 2 commits June 17, 2026 16:04
Some Java test forks intermittently stall in a way that surefire's own
timeouts never catch, so the job runs until the GitHub Actions cap and is
cancelled with no output to diagnose, and the stall does not reproduce
locally.

Add an outer guard in the docker test entrypoint that watches the test log
for a stall (no new line for a window kept just above the per-fork surefire
timeout) and an absolute runtime ceiling below the job cap. On either
trigger it dumps thread stacks from every JVM in the test process tree via
SIGQUIT (relayed into the job log) plus a jstack file backup, then
force-kills the tree so the job fails fast with stacks instead of being
cancelled empty-handed. Limits are overridable via SYSDS_TEST_STALL_LIMIT
and SYSDS_TEST_MAX_RUNTIME.

Also set surefire runOrder to alphabetical so a hang reproduces at a stable
class boundary, making the responsible class identifiable from the dumps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants