Skip to content

[SYSTEMDS-3946] Enable sending of large (>2GiB) FederatedRequests and…#2496

Open
Biranavan-Parameswaran wants to merge 1 commit into
apache:mainfrom
Biranavan-Parameswaran:SYSTEMDS-3946-large-federated-requests
Open

[SYSTEMDS-3946] Enable sending of large (>2GiB) FederatedRequests and…#2496
Biranavan-Parameswaran wants to merge 1 commit into
apache:mainfrom
Biranavan-Parameswaran:SYSTEMDS-3946-large-federated-requests

Conversation

@Biranavan-Parameswaran

@Biranavan-Parameswaran Biranavan-Parameswaran commented Jun 21, 2026

Copy link
Copy Markdown

Federated transfers previously failed for payloads above 2GiB because the
single Netty frame size is bounded by a 32-bit length field, capping any
request or response at Integer.MAX_VALUE bytes.

This patch adds a streaming chunked codec that splits a large payload into
bounded frames on the sender and reassembles them on the receiver, so the
on-wire size is no longer limited by a single frame. A format detector and
format encoder select the chunked path only when the payload exceeds the
frame limit, leaving the existing small-message path unchanged to avoid
added overhead for the common case.

Adds FederatedMaxPayloadTest to exercise the boundary around the former
2GiB cap.

… Responses

Federated transfers previously failed for payloads above 2GiB because the
single Netty frame size is bounded by a 32-bit length field, capping any
request or response at Integer.MAX_VALUE bytes.

This patch adds a streaming chunked codec that splits a large payload into
bounded frames on the sender and reassembles them on the receiver, so the
on-wire size is no longer limited by a single frame. A format detector and
format encoder select the chunked path only when the payload exceeds the
frame limit, leaving the existing small-message path unchanged to avoid
added overhead for the common case.

Adds FederatedMaxPayloadTest to exercise the boundary around the former
2GiB cap.
@github-project-automation github-project-automation Bot moved this to In Progress in SystemDS PR Queue Jun 21, 2026
@Biranavan-Parameswaran Biranavan-Parameswaran marked this pull request as ready for review June 21, 2026 16:15
@ywcb00 ywcb00 self-assigned this Jun 22, 2026

@ywcb00 ywcb00 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR @Biranavan-Parameswaran :)
I left some minor comments in the code. Could you please have a look at it and resolve it if you find the time. Thanks.


static final byte MARKER_LEGACY = 0;
static final byte MARKER_CHUNKED = 1;
static final long STREAM_THRESHOLD = 1536L << 20; // ~1.5 GB: route below this through the legacy object codec

@ywcb00 ywcb00 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the regular encoder as long as we can, i.e., up to the largest possible message size. Can we increase this default threshold from 1.5GB to (INT_MAX - 1) bytes?

Comment on lines +62 to +64
catch(ExecutionException e) {
Assert.fail("Federated transfer failed: " + e.getMessage());
}

@ywcb00 ywcb00 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this catch as the exception is anyways caught by the catch block that is directly below. (redundant)

import io.netty.channel.ChannelPipeline;
import io.netty.handler.codec.ByteToMessageDecoder;

public final class FederatedFormatDetector extends ByteToMessageDecoder {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more intuitive to name this class "FederatedFormatDecoder"?

@github-project-automation github-project-automation Bot moved this from In Progress to In Review in SystemDS PR Queue Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants