Skip to content

[core][flink][spark] Support ARRAY<BLOB> blob files#8181

Draft
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/list-blob-support
Draft

[core][flink][spark] Support ARRAY<BLOB> blob files#8181
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/list-blob-support

Conversation

@leaves12138

@leaves12138 leaves12138 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Purpose

Support storing top-level ARRAY columns in dedicated blob files, so one table can use blob file storage for both BLOB and ARRAY fields across core, Flink, and Spark paths.

Changes

  • Treat BLOB and ARRAY as blob-file fields in schema validation, data-evolution planning, blob file context, and column directive cleanup.
  • Extend BlobFileFormat reader/writer with an ARRAY payload layout that keeps one blob-file record per table row and stores per-element lengths in a compact tail index.
  • Preserve support for null arrays, null elements, empty arrays, selection reads, descriptor reads, inline reads, and whole-field placeholders.
  • Add Flink catalog conversion and Flink array read conversion so ARRAY blob fields round-trip as ARRAY internally.
  • Add Spark catalog conversion and Spark array/data converters so ARRAY blob fields round-trip as ARRAY internally.
  • Add format-level, table-level, Flink e2e, and Spark e2e coverage for ARRAY.

Tests

  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-core,paimon-format -am -DskipTests compile
  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-format -am -Pfast-build -DfailIfNoTests=false -Dtest=BlobFileFormatTest test
  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=BlobTableTest#testArrayBlobField,ColumnDirectiveUtilsTest#testBlobDirectiveWithArraySourceType test
  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=BlobTableITCase#testArrayBlobField test
  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-api,paimon-format,paimon-core,paimon-bundle,paimon-spark/paimon-spark-common,paimon-spark/paimon-spark3-common -am -Pfast-build,spark3 -DskipTests install
  • JAVA_HOME=/opt/jdk-17.0.2.jdk/Contents/Home mvn -pl paimon-spark/paimon-spark-ut -am -Pfast-build,spark3 -DfailIfNoTests=false -DwildcardSuites=org.apache.paimon.spark.sql.BlobTestBase -Dtest=none test

@leaves12138 leaves12138 force-pushed the codex/list-blob-support branch 2 times, most recently from 59aba11 to 5012a79 Compare June 9, 2026 07:45
@leaves12138 leaves12138 changed the title [core] Support ARRAY<BLOB> blob files [core][flink][spark] Support ARRAY<BLOB> blob files Jun 9, 2026
@JingsongLi

Copy link
Copy Markdown
Contributor

Let's add Python together.

@leaves12138 leaves12138 force-pushed the codex/list-blob-support branch 6 times, most recently from 7b8dcac to 43a6fc4 Compare June 9, 2026 09:33
@leaves12138 leaves12138 force-pushed the codex/list-blob-support branch from 43a6fc4 to e568c48 Compare June 9, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants