Skip to content

feat: Add OSS archive FileIO support#8148

Open
Shekharrajak wants to merge 5 commits into
apache:masterfrom
Shekharrajak:feature/paimon-5510-oss-archive
Open

feat: Add OSS archive FileIO support#8148
Shekharrajak wants to merge 5 commits into
apache:masterfrom
Shekharrajak:feature/paimon-5510-oss-archive

Conversation

@Shekharrajak

Copy link
Copy Markdown
Contributor

Ref #5510 (comment)

Purpose

Implements OSS-backed archive, restore, and unarchive operations for Paimon FileIO by mapping StorageType to OSS storage classes and issuing same-key OSS copy/restore requests while
preserving common object metadata.

Tests

mvn spotless:apply
mvn -pl paimon-filesystems/paimon-oss-impl -am -Pfast-build -DfailIfNoTests=false -Dtest=OSSArchiveOperationsTest test

@JingsongLi

Copy link
Copy Markdown
Contributor

I found one issue with the OSS archive/unarchive implementation.

changeStorageClass always uses a single OSSClient.copyObject(...) call to rewrite the object with a new storage class. Alibaba Cloud OSS documents CopyObject for objects up to 1 GB; larger objects should be copied with multipart copy / UploadPartCopy instead. Paimon data files can exceed 1 GB, so archive / unarchive may fail for large files.

Could we branch on sourceMetadata.getContentLength() and use multipart copy for large objects, preserving the source metadata and setting the target storage class during the multipart upload? The current single-copy path can remain for objects within the supported size.

Reference: https://www.alibabacloud.com/help/en/oss/developer-reference/copy-objects-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants