Core: V4 write direction wrappers#16936
Conversation
| * <p>REPLACED files are the prior-state entries of v4 REPLACED/MODIFIED pairs and are not live. | ||
| * Returns null for manifest files written by pre-v4 writers. | ||
| */ | ||
| default Integer replacedFilesCount() { |
There was a problem hiding this comment.
Generally, public-interface changes should be minimized and delayed. Justification for keeping these two on the interface in this PR:
In-tree consumers (this PR + close follow-ups):
ContentEntryAdapters.manifestInfo()(this PR) populates the v4 root manifest'smanifest_infostruct frommanifest.replacedFilesCount()/replacedRowsCount().MergingSnapshotProducer.validateAddedDVs(follow-up phase) filters concurrent data manifests byreplacedFilesCount > 0to detect v4 colocated-DV write conflicts — without it, every concurrent data manifest must be deep-scanned.
Convention. The interface-default pattern matches v3's existing extensions for spec-defined optional fields on ManifestFile: firstRowId (row lineage), keyMetadata (encryption), containsNaN (partition NaN). v4 spec-defined manifest_info counts belong on the same interface in the same shape.
A TrackedFile-based parallel v4 API was considered. Would require rewriting ManifestGroup and every engine that consumes FileScanTask. The cost is disproportionate to the savings.
There was a problem hiding this comment.
Projected new default methods across the full v4 stack, all spec-defined additions backing v4 manifest_info / content_entry fields:
In this PR:
default Integer replacedFilesCount()default Long replacedRowsCount()default int formatVersion()
when root-level manifest DV bitmaps are implemented:
default ByteBuffer deletedPositions()default ByteBuffer replacedPositions()
gaborkaszab
left a comment
There was a problem hiding this comment.
Hey @stevenzwu ,
I went through mostly for my own understanding. Left some comments, mostly I'm a bit hesitant to expose setting status and sequence numbers directly through the builders. Let me know what you think!
c78d6b1 to
828e0c7
Compare
828e0c7 to
fa86dc1
Compare
fa46d3d to
55f5573
Compare
| * whose tracking values come from outside the writer (v4 manifest references, projections from | ||
| * legacy ManifestEntry). | ||
| */ | ||
| static TrackedFileBuilder explicitTracking( |
There was a problem hiding this comment.
@gaborkaszab following up on this comment from your PR, I am adding this new factory method for the new code path in this PR.
b5c92a0 to
d5c696b
Compare
Per review (gaborkaszab on PR apache#16936): the first if-block existed only to produce specific error messages distinguishing v3 DVs from v2 position delete files; the actual invariant (only EQUALITY_DELETES allowed in v4 leaf delete manifests) is enforced by the Preconditions check that immediately follows. Keep the explanatory comment on why POSITION_DELETES is rejected (with both context bullets) above the single invariant check. Update the two POSITION_DELETES rejection tests to assert on the unified error message. Generated-by: Claude Code Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0f64967 to
bff274c
Compare
30e188a to
6bb3622
Compare
Introduces the Phase 1 foundation for v4 metadata tree writes: TrackedFile and supporting types are extended with v4-specific shape, and the new ContentEntryAdapters utility projects legacy ManifestEntry rows and ManifestFile references into TrackedFile instances for the v4 content_entry schema. ManifestFile gains V4_FORMAT_VERSION and LEGACY_FORMAT_VERSION constants and default formatVersion() / recordCount() accessors. GenericManifestFile gets a v4 constructor variant that populates these fields at construction time; pre-v4 callers use the existing 17-arg constructor which delegates with defaults. ContentEntryAdapters.fromManifestFile dispatches the record_count source by the manifest's intrinsic formatVersion: v4+ manifests must carry a non-null recordCount populated by the writer or reader; pre-v4 manifests have it summed from the per-status file counts. TestContentEntryAdapters covers ADDED/EXISTING/DELETED tracking shape, the unpartitioned and partitioned-projection paths, format_version dispatch, record_count requirement for v4+, and the rejected status transitions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6bb3622 to
0535c88
Compare
| * manifest's snapshot_id | ||
| */ | ||
| static TrackedFileBuilder data(long newSnapshotId) { | ||
| static TrackedFileBuilder data(Long newSnapshotId) { |
There was a problem hiding this comment.
when writing a data file entry to a staged leaf manifest file, the snapshot id is null and will be inherited from the manifest's snapshot id when written to the root manifest file.
No description provided.