Skip to content

Add extstore s3 driver#2907

Draft
cconstable wants to merge 8 commits into
extstore-foundationfrom
extstore-s3-driver
Draft

Add extstore s3 driver#2907
cconstable wants to merge 8 commits into
extstore-foundationfrom
extstore-s3-driver

Conversation

@cconstable

@cconstable cconstable commented Jun 10, 2026

Copy link
Copy Markdown

What changed?

  • Added the experimental S3 external storage driver and AWS SDK v2 adapter at contrib/temporal-payload-storage-s3/src/main/java/io/temporal/payload/storage/s3/
  • Added shared PayloadHasher for drivers to consistently hash payload content.

Why?

  • Enables Temporal payloads to be offloaded to S3 through the external storage driver API.

Checklist

  • Added focused S3 driver tests covering storage, retrieval, key format, error handling, validation, and README examples.

@cconstable cconstable changed the base branch from master to extstore-foundation June 10, 2026 18:59
@cconstable cconstable changed the title extstore s3 driver Add extstore s3 driver Jun 10, 2026
- Missing values (including a missing `run-id`) are encoded as `null`.
- `hex-digest` is lower-case SHA-256 hex (64 characters).

### Percent-encoding rules

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are encoding rules that all temporal s3 drivers should conform to. If we have a better place to put information like this I will move it, otherwise I'll put something similar in the READMEs of the other SDKs.


/** Computes payload hashes shared by external storage drivers. */
@Experimental
public final class PayloadHasher {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any driver we add that creates content-addressable keys will need to hash the content somehow. I opted to put this in a shared place. Happy to rename or move if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to the s3 package for now. As it is, it's publicly available to anyone using the SDK, and I think this should be an implementation detail.

@cconstable cconstable marked this pull request as ready for review June 10, 2026 20:21
@cconstable cconstable requested a review from a team as a code owner June 10, 2026 20:21
@cconstable

Copy link
Copy Markdown
Author

The actual changes here are only a few hundred lines. The vast majority of the diff is tests, comments, and java ceremony. storage/s3/S3StorageDriver.java is the big change.


/** Interface for S3 {@link S3StorageDriver} operations: upload, existence check, and download. */
@Experimental
public interface S3Client {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3StorageDriverClient

@@ -0,0 +1,18 @@
package io.temporal.payload.storage.s3;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package io.temporal.payload.storage.s3driver to avoid collisions with the AWS S3 SDK package naming?

* be configured with credentials and a region by the caller.
*/
@Experimental
public final class S3AsyncClientAdapter implements S3Client {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the io.temporal.payload.storage.s3 package have a hard dependency on the AWS S3 SDK libraries, which is different than how we did it for Go and Python, which use a separate module for the client implementation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split out to a temporal.payload.storage.s3driver.awssdkv2


/** Computes payload hashes shared by external storage drivers. */
@Experimental
public final class PayloadHasher {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to the s3 package for now. As it is, it's publicly available to anyone using the SDK, and I think this should be an implementation detail.

*/
final class S3StorageKey {
private static final String KEY_VERSION = "v0";
private static final String PATH_SEGMENT_UNRESERVED = "-_.~$&+:=@";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you validate that these are acceptable characters in S3 object keys?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. I followed Go's implementation for this but I did some reading and interestingly both Go and Python allow at least one character the S3 docs say to avoid (~) https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

I'm going to update the Java SDK to diverge from both the existing implementations, update the key encoding "spec" in the README, and then patch the other two SDKs to follow AWS's recommendations. That seems like the correct thing to do here.

* larger payload fails the {@code store} call.
*/
public Builder setMaxPayloadSize(int maxPayloadSize) {
this.maxPayloadSize = maxPayloadSize;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range check here instead of in builder? not sure what's idiomatic in Java

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not either. I looked it up and it seems like putting it in here is more idiomatic. Updating.

* setting both before {@link #build()} is an error.
*/
public Builder setBucket(@Nonnull String bucket) {
this.staticBucket = Objects.requireNonNull(bucket, "bucket");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the staticBucket field and provide an annonymous implementation for the bucket resolver? Eliminates the field and the XOR check.

@cconstable cconstable marked this pull request as draft June 12, 2026 18:27
return withFailureContext(uploadRequest, "upload failed " + location);
})
.thenApply(ignored -> claimFor(bucket, key, hexDigest));
cancelRequestWhenCancelled(claimFuture, inFlightRequest);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had here previously did not properly cancel the upstream client request. Now cancellation should propagate correctly:

  1. client.objectExists(bucket, key); creates a future that eagerly executes.
  2. that is wrapped in a future that provides failure context via withFailureContext
  3. thenCompose chains the client.putObject request
  4. cancelRequestWhenCancelled(claimFuture, inFlightRequest); listens for a cancel on claimFuture and then reaches upstream and cancels the current inFlightRequest.

If there is a more elegant way to do this in Java please let me know. An alternative solution is returning an opaque cancellation token/context but now you've got two things to cancel and that might be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants