Skip to content

Add option to let activities heartbeat during worker shutdown#2903

Open
baekgyu-kim wants to merge 1 commit into
temporalio:masterfrom
baekgyu-kim:2075
Open

Add option to let activities heartbeat during worker shutdown#2903
baekgyu-kim wants to merge 1 commit into
temporalio:masterfrom
baekgyu-kim:2075

Conversation

@baekgyu-kim

@baekgyu-kim baekgyu-kim commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What was changed

Added an experimental worker option, WorkerOptions.Builder#setAllowActivityHeartbeatDuringShutdown(boolean) (default false, preserving the existing behavior).

  • When enabled, on a graceful worker shutdown the activity heartbeat executor is shut down only after all outstanding activity tasks have finished executing, so in-flight activities keep heartbeating through the existing mechanism — no separate heartbeat code path.
  • When disabled (default), or whenever shutdownNow is used, the behavior is unchanged: the heartbeat executor is shut down first, so ActivityExecutionContext.heartbeat() throws ActivityWorkerShutdownException.
  • Gated by allowActivityHeartbeatDuringShutdown && !interruptTasks, so non-graceful shutdown (WorkerFactory.shutdownNow) always behaves as if the option were disabled.

Touched: WorkerOptions, SingleWorkerOptions, Worker, SyncActivityWorker.

Why?

  • Before: once a graceful shutdown was requested, heartbeat() threw ActivityWorkerShutdownException for the rest of the awaitTermination grace period. An activity that wanted to run to completion during that window could not refresh its server-side heartbeat deadline → the server timed it out and retried it → duplicate executions, even though the worker deliberately gave the activity time to finish.
  • After (opt-in): with the option enabled, in-flight activities keep heartbeating during the grace period, so each call refreshes the heartbeat deadline and the server does not prematurely time out / retry the activity.
  • Design: reuses the existing heartbeat path by only reordering executor shutdown — no new heartbeat logic. The default stays false, so existing behavior is preserved, and shutdownNow is intentionally excluded. Originates from Add the ability to keep heartbeating while the worker is shutting down #2075.

Note: with the option enabled, activities are no longer notified of shutdown via ActivityWorkerShutdownException, so they are expected to complete within the termination grace period on their own. This is documented on the setter.

Checklist

  1. Closes Add the ability to keep heartbeating while the worker is shutting down #2075

  2. How was this tested:

    New HeartbeatDuringWorkerShutdownTest (standalone activity + two-semaphore handshake; gated behind SDKTestWorkflowRule.useExternalService):

    • testHeartbeatingActivityCompletesDuringShutdown — option enabled + graceful shutdown(): the activity heartbeats and runs to completion.
    • testHeartbeatingActivityFailsDuringShutdownNowshutdownNow(): the option is ignored, and the heartbeat fails the activity.

    WorkerOptionsTest — copy-builder round-trips the new option.

    # the integration tests require an external server
    USE_EXTERNAL_SERVICE=true TEMPORAL_SERVICE_ADDRESS=localhost:7233 \
      ./gradlew :temporal-sdk:test --tests "io.temporal.worker.shutdown.HeartbeatDuringWorkerShutdownTest"
    ./gradlew :temporal-sdk:test --tests "io.temporal.worker.WorkerOptionsTest"
    ./gradlew :temporal-sdk:spotlessJavaCheck
    
  3. Any docs updates needed?

    • The behavior is documented in the WorkerOptions.Builder#setAllowActivityHeartbeatDuringShutdown Javadoc.

@maciejdudko maciejdudko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @baekgyu-kim, thank you for your contribution! It's great to see someone taking on these long standing issues. However, this is not the right implementation.

There should be a new worker option to enable heartbeating during shutdown, It should default to disabled, and when disabled, the behavior should be identical to existing behavior for backward compatibility purposes.

When the option is enabled, the heartbeat behavior should be identical to what happens during normal heartbeat when the worker is not shutting down. There should be no additional code path that calls sendHeartbeatRequest a different way, the existing mechanism should be used. The way to achieve that is to modify SyncActivityWorker.shutdown so that heartbeatExecutor.shutdown is only called after all outstanding activity tasks have finished executing.

If you need assistance with implementation, feel free to reach out on community Slack, either message me directly or post on #java-sdk channel.

@baekgyu-kim baekgyu-kim force-pushed the 2075 branch 2 times, most recently from dc7f0cc to 0d8852a Compare June 11, 2026 23:22
@baekgyu-kim baekgyu-kim requested a review from maciejdudko June 11, 2026 23:22
@baekgyu-kim

Copy link
Copy Markdown
Contributor Author

Hi @maciejdudko,
Thank you for the thoughtful review! I've reworked the PR as suggested.

It now adds an experimental WorkerOptions.Builder#setActivityHeartbeatDuringShutdown option. (default false, preserving the existing behavior)
When enabled, the heartbeat executor is shut down only after all outstanding activity tasks have finished executing, so heartbeats go through the existing mechanism without any separate code path.

Whenever you have a chance, I'd appreciate another look. Thanks again!

private PollerBehavior workflowTaskPollersBehavior;
private PollerBehavior activityTaskPollersBehavior;
private PollerBehavior nexusTaskPollersBehavior;
private boolean activityHeartbeatDuringShutdown;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field should be named allowActivityHeartbeatDuringShutdown, the options getter should be named getAllowActivityHeartbeatDuringShutdown, and the builder setter should be named setAllowActivityHeartbeatDuringShutdown. Apply this change consistently throughout the PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed consistently to allowActivityHeartbeatDuringShutdown (field), getAllowActivityHeartbeatDuringShutdown() (getter), and setAllowActivityHeartbeatDuringShutdown(...) (setter) across WorkerOptions, SingleWorkerOptions, and Worker.

return null;
});
CompletableFuture<Void> shutdownFuture;
if (activityHeartbeatDuringShutdown) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When interruptTasks is true (shutdownNow was called instead of shutdown), it should behave as if heartbeat during shutdown was disabled.

Suggested change
if (activityHeartbeatDuringShutdown) {
if (allowActivityHeartbeatDuringShutdown && !interruptTasks) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the condition to allowActivityHeartbeatDuringShutdown && !interruptTasks, so shutdownNow now behaves exactly as if the option were disabled.

Comment on lines +380 to +382
* io.temporal.client.ActivityWorkerShutdownException}, unless {@link
* WorkerOptions.Builder#setActivityHeartbeatDuringShutdown(boolean)} is enabled, in which case
* heartbeats keep working until the activity tasks finish executing.<br>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdownNow behavior stays the same, see comment in SyncActivityWorker.

Suggested change
* io.temporal.client.ActivityWorkerShutdownException}, unless {@link
* WorkerOptions.Builder#setActivityHeartbeatDuringShutdown(boolean)} is enabled, in which case
* heartbeats keep working until the activity tasks finish executing.<br>
* io.temporal.client.ActivityWorkerShutdownException}.<br>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted. The WorkerFactory.shutdown Javadoc no longer references the option, so shutdownNow keeps its original documented behavior.

Comment on lines +529 to +547
/**
* If enabled, activities can keep heartbeating while the worker is shutting down. The activity
* heartbeat executor is closed only after all outstanding activity tasks have finished
* executing, so {@link io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} behaves
* exactly as it does while the worker is running: heartbeats are throttled and sent to the
* server, which keeps the server from timing the activity out during the {@link
* WorkerFactory#awaitTermination(long, java.util.concurrent.TimeUnit)} grace period.
*
* <p>Note that with this option enabled activities are no longer notified of the worker
* shutdown by an {@link io.temporal.client.ActivityWorkerShutdownException} thrown from {@code
* heartbeat}, so they are expected to complete within the termination grace period on their
* own.
*
* <p>Defaults to false, meaning that after shutdown is requested, {@link
* io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} stops sending heartbeats and
* throws {@link io.temporal.client.ActivityWorkerShutdownException}.
*/
@Experimental
public Builder setActivityHeartbeatDuringShutdown(boolean activityHeartbeatDuringShutdown) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to document implementation details.

Suggested change
/**
* If enabled, activities can keep heartbeating while the worker is shutting down. The activity
* heartbeat executor is closed only after all outstanding activity tasks have finished
* executing, so {@link io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} behaves
* exactly as it does while the worker is running: heartbeats are throttled and sent to the
* server, which keeps the server from timing the activity out during the {@link
* WorkerFactory#awaitTermination(long, java.util.concurrent.TimeUnit)} grace period.
*
* <p>Note that with this option enabled activities are no longer notified of the worker
* shutdown by an {@link io.temporal.client.ActivityWorkerShutdownException} thrown from {@code
* heartbeat}, so they are expected to complete within the termination grace period on their
* own.
*
* <p>Defaults to false, meaning that after shutdown is requested, {@link
* io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} stops sending heartbeats and
* throws {@link io.temporal.client.ActivityWorkerShutdownException}.
*/
@Experimental
public Builder setActivityHeartbeatDuringShutdown(boolean activityHeartbeatDuringShutdown) {
/**
* If true, activities can keep heartbeating during graceful worker shutdown (see {@link
* io.temporal.worker.WorkerFactory#shutdown WorkerFactory.shutdown}). Defaults to false,
* which means that after graceful shutdown is requested, calling {@link
* io.temporal.activity.ActivityExecutionContext#heartbeat ActivityExecutionContext.heartbeat}
* does not send a heartbeat and instead throws {@link
* io.temporal.client.ActivityWorkerShutdownException ActivityWorkerShutdownException}. This
* option is ignored by non-graceful shutdown (see {@link
* io.temporal.worker.WorkerFactory#shutdownNow WorkerFactory.shutdownNow}).
*
* <p>Note that with this option enabled, activities are no longer notified of the worker
* shutdown by the {@link io.temporal.client.ActivityWorkerShutdownException
* ActivityWorkerShutdownException} exception, so they are expected to complete within the
* termination grace period on their own.
*/
@Experimental
public Builder setAllowActivityHeartbeatDuringShutdown(boolean allowActivityHeartbeatDuringShutdown) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggested wording and dropped the implementation details.

Comment on lines +57 to +59
WorkflowExecution execution = WorkflowClient.start(workflow::execute);
started.get();
testWorkflowRule.getTestEnvironment().shutdown();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race condition here - shutdown() call can go through before activity worker receives the task, which will prevent the activity from running and the test will fail.

This feature will be easier to test using a standalone activity. It should work like this:

  1. Test starts activity.
  2. Test blocks on a semaphore 1 until activity starts.
  3. Activity signals semaphore 1.
  4. Activity blocks on semaphore 2 until shutdown is triggered.
  5. Test calls shutdown().
  6. Test signals semaphore 2.
  7. Activity heartbeats then returns. (An exception will fail the activity.)
  8. Test calls result() on activity handle to ensure it succeeded. (Failure will throw exception and fail the test.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed outline.
Reworked the test to use a standalone activity with the two-semaphore handshake you described: it waits for the activity to start, triggers shutdown, then releases the activity to heartbeat and return.
This removes the race where shutdown() could land before the task was picked up.

* ActivityWorkerShutdownException}.
*/
@Test
public void testHeartbeatingActivityCompletesDuringShutdown()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a test case for when shutdownNow is called instead of shutdown.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added testHeartbeatingActivityFailsDuringShutdownNow, which calls shutdownNow() and asserts the heartbeat fails the activity instead of letting it complete, confirming the option is ignored for non-graceful shutdown.

@baekgyu-kim

baekgyu-kim commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Hi @maciejdudko,
Thanks again for your review.
Sorry for making you review this twice, and I really appreciate the detailed feedback.

I've addressed all of your comments:

  • Naming: Renamed consistently to allowActivityHeartbeatDuringShutdown (field, getAllowActivityHeartbeatDuringShutdown() getter, setAllowActivityHeartbeatDuringShutdown(...) setter) across WorkerOptions, SingleWorkerOptions, and Worker.
  • shutdownNow behavior: Changed the condition to allowActivityHeartbeatDuringShutdown && !interruptTasks, so shutdownNow behaves exactly as if the option were disabled (heartbeat executor shut down first).
  • WorkerFactory.shutdown javadoc: Reverted. It no longer references the option, so both shutdown and shutdownNow keep their original documented behavior.
  • WorkerOptions setter javadoc: Replaced with your suggested wording, dropping the implementation details.
  • Test race condition: Reworked into a standalone activity with the two-semaphore handshake you outlined (wait for the activity to start → trigger shutdown → release the activity to heartbeat and return), eliminating the race where shutdown could land before the task is picked up.
  • shutdownNow test case: Added testHeartbeatingActivityFailsDuringShutdownNow, which asserts the heartbeat fails the activity instead of letting it complete.

Thanks again for the careful review!

@baekgyu-kim baekgyu-kim changed the title Let activities heartbeat during worker shutdown Add option to let activities heartbeat during worker shutdown Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the ability to keep heartbeating while the worker is shutting down

2 participants