Skip to content

Core: V4 write direction wrappers#16936

Open
stevenzwu wants to merge 1 commit into
apache:mainfrom
stevenzwu:v4_write_direction_wrappers
Open

Core: V4 write direction wrappers#16936
stevenzwu wants to merge 1 commit into
apache:mainfrom
stevenzwu:v4_write_direction_wrappers

Conversation

@stevenzwu

Copy link
Copy Markdown
Contributor

No description provided.

* <p>REPLACED files are the prior-state entries of v4 REPLACED/MODIFIED pairs and are not live.
* Returns null for manifest files written by pre-v4 writers.
*/
default Integer replacedFilesCount() {

@stevenzwu stevenzwu Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, public-interface changes should be minimized and delayed. Justification for keeping these two on the interface in this PR:

In-tree consumers (this PR + close follow-ups):

  • ContentEntryAdapters.manifestInfo() (this PR) populates the v4 root manifest's manifest_info struct from manifest.replacedFilesCount() / replacedRowsCount().
  • MergingSnapshotProducer.validateAddedDVs (follow-up phase) filters concurrent data manifests by replacedFilesCount > 0 to detect v4 colocated-DV write conflicts — without it, every concurrent data manifest must be deep-scanned.

Convention. The interface-default pattern matches v3's existing extensions for spec-defined optional fields on ManifestFile: firstRowId (row lineage), keyMetadata (encryption), containsNaN (partition NaN). v4 spec-defined manifest_info counts belong on the same interface in the same shape.

A TrackedFile-based parallel v4 API was considered. Would require rewriting ManifestGroup and every engine that consumes FileScanTask. The cost is disproportionate to the savings.

@stevenzwu stevenzwu Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projected new default methods across the full v4 stack, all spec-defined additions backing v4 manifest_info / content_entry fields:

In this PR:

  • default Integer replacedFilesCount()
  • default Long replacedRowsCount()
  • default int formatVersion()

when root-level manifest DV bitmaps are implemented:

  • default ByteBuffer deletedPositions()
  • default ByteBuffer replacedPositions()

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @stevenzwu ,
I went through mostly for my own understanding. Left some comments, mostly I'm a bit hesitant to expose setting status and sequence numbers directly through the builders. Let me know what you think!

Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ContentEntryAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/TrackedFileBuilder.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/TrackingBuilder.java Outdated
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 5 times, most recently from c78d6b1 to 828e0c7 Compare June 25, 2026 21:35
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch from 828e0c7 to fa86dc1 Compare June 25, 2026 22:18
@github-actions github-actions Bot added the API label Jun 25, 2026
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 5 times, most recently from fa46d3d to 55f5573 Compare June 26, 2026 05:57
* whose tracking values come from outside the writer (v4 manifest references, projections from
* legacy ManifestEntry).
*/
static TrackedFileBuilder explicitTracking(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaborkaszab following up on this comment from your PR, I am adding this new factory method for the new code path in this PR.

@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 8 times, most recently from b5c92a0 to d5c696b Compare June 28, 2026 06:09
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Jun 28, 2026
Per review (gaborkaszab on PR apache#16936): the first if-block existed only
to produce specific error messages distinguishing v3 DVs from v2
position delete files; the actual invariant (only EQUALITY_DELETES
allowed in v4 leaf delete manifests) is enforced by the Preconditions
check that immediately follows. Keep the explanatory comment on why
POSITION_DELETES is rejected (with both context bullets) above the
single invariant check. Update the two POSITION_DELETES rejection tests
to assert on the unified error message.

Generated-by: Claude Code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 2 times, most recently from 0f64967 to bff274c Compare June 28, 2026 22:53
@stevenzwu stevenzwu marked this pull request as ready for review June 29, 2026 01:33
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch 2 times, most recently from 30e188a to 6bb3622 Compare June 29, 2026 05:15
Introduces the Phase 1 foundation for v4 metadata tree writes: TrackedFile
and supporting types are extended with v4-specific shape, and the new
ContentEntryAdapters utility projects legacy ManifestEntry rows and
ManifestFile references into TrackedFile instances for the v4 content_entry
schema.

ManifestFile gains V4_FORMAT_VERSION and LEGACY_FORMAT_VERSION constants and
default formatVersion() / recordCount() accessors. GenericManifestFile gets a
v4 constructor variant that populates these fields at construction time;
pre-v4 callers use the existing 17-arg constructor which delegates with
defaults.

ContentEntryAdapters.fromManifestFile dispatches the record_count source by
the manifest's intrinsic formatVersion: v4+ manifests must carry a non-null
recordCount populated by the writer or reader; pre-v4 manifests have it
summed from the per-status file counts.

TestContentEntryAdapters covers ADDED/EXISTING/DELETED tracking shape, the
unpartitioned and partitioned-projection paths, format_version dispatch,
record_count requirement for v4+, and the rejected status transitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevenzwu stevenzwu force-pushed the v4_write_direction_wrappers branch from 6bb3622 to 0535c88 Compare June 29, 2026 05:24
* manifest's snapshot_id
*/
static TrackedFileBuilder data(long newSnapshotId) {
static TrackedFileBuilder data(Long newSnapshotId) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when writing a data file entry to a staged leaf manifest file, the snapshot id is null and will be inherited from the manifest's snapshot id when written to the root manifest file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants