Skip to content

Add commit info to partition exports table#1832

Open
arthurpassos wants to merge 11 commits into
antalya-26.3from
expand-replicated-partition-exports-columns
Open

Add commit info to partition exports table#1832
arthurpassos wants to merge 11 commits into
antalya-26.3from
expand-replicated-partition-exports-columns

Conversation

@arthurpassos

@arthurpassos arthurpassos commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add Iceberg metadata file paths and data file paths to the system.replicated_partition_exports to improve observability and debugging. Closes #1832

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@arthurpassos arthurpassos added antalya-26.3 port-antalya PRs to be ported to all new Antalya releases antalya labels May 25, 2026
@arthurpassos

Copy link
Copy Markdown
Collaborator Author

@codex review

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown

Workflow [PR], commit [3e55439]

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a9022d641

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1900 to +1907
LOG_INFO(log,
"Export transaction {} already committed, skipping re-commit",
transaction_id);
return;
/// Empty commit info: paths produced by the original commit are not recoverable
/// here without re-reading the snapshot. ExportPartitionUtils::commit only
/// writes the commit_info znode when this struct is non-empty, so the original
/// committer's record (if any) is preserved.
return {};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return commit info for already-committed recovery path

When isExportPartitionTransactionAlreadyCommitted is true here, the function returns an empty ExportPartitionCommitInfo, which makes ExportPartitionUtils::commit skip writing <export-entry>/commit_info and only flip status to COMPLETED. In the common recovery scenario (snapshot committed previously, then retried before ZooKeeper was updated), this leaves committed_* fields permanently empty and contradicts the new observability behavior/documentation that expects a visible marker for already-committed retries.

Useful? React with 👍 / 👎.

Comment on lines +857 to +858
assert committed_metadata_file == "<committed in a previous run, paths unavailable>", (
f"Expected sentinel in committed_metadata_file for already-committed retry, got: {committed_metadata_file!r}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix sentinel expectation in post-publish failpoint test

This assertion is inconsistent with the implementation under iceberg_writes_post_publish_throw: that failpoint is ONCE, and the published catch path now returns real file paths (storage_metadata_name, manifest list, manifest file), so committed_metadata_file should be a metadata path, not the "previous run" sentinel. As written, the test will fail despite correct behavior and can block CI.

Useful? React with 👍 / 👎.

@arthurpassos

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru

Copy link
Copy Markdown

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1832 (Add commit info to partition exports table):

Confirmed defects

Medium: commit_info serialization failure can block COMPLETED transition

Impact: A successful destination commit (Iceberg snapshot or object-storage marker) can remain stuck in PENDING (or require later recovery) if commit-info JSON generation throws before ZooKeeper /status is flipped to COMPLETED.

Anchor: src/Storages/MergeTree/ExportPartitionUtils.cpp / ExportPartitionUtils::commit and ExportReplicatedMergeTreePartitionCommitInfoEntry::toJsonString

Trigger: Any exception thrown by ExportReplicatedMergeTreePartitionCommitInfoEntry::toJsonString (it explicitly enables std::ios::failbit exceptions) while destination_commit_info is non-empty.

Why defect: The new observability side-effect (JSON stringify) is now on the critical path of the state transition; a stringify exception aborts the function before the status update, changing behavior from “commit then mark completed” to “commit then throw”.

Fix direction (short): Make commit_info persistence best-effort: wrap toJsonString + tryMulti in try/catch and always fall back to status-only trySet on any exception.

Regression test direction (short): Add a failpoint (or targeted fault injection) that throws during commit_info serialization and assert the task still reaches COMPLETED.

Evidence

Serialization enables stream exceptions:

std::string toJsonString() const
{
    Poco::JSON::Object json;
    json.set("iceberg_metadata_file", iceberg_metadata_file);
    // ...
    std::ostringstream oss;
    oss.exceptions(std::ios::failbit);
    Poco::JSON::Stringifier::stringify(json, oss);
    return oss.str();
}

toJsonString is called before the status flip in the new “atomic commit_info + COMPLETED” multi-op; an exception here prevents reaching the fallback status-only set:

if (!destination_commit_info.empty())
{
    ExportReplicatedMergeTreePartitionCommitInfoEntry commit_info_entry { /* ... */ };
    const std::string commit_info_path = fs::path(entry_path) / "commit_info";

    Coordination::Requests ops;
    ops.emplace_back(zkutil::makeCreateRequest(commit_info_path, commit_info_entry.toJsonString(), zkutil::CreateMode::Persistent));
    ops.emplace_back(zkutil::makeSetRequest(status_path, completed_name, -1));
    // ...
    if (rc == Coordination::Error::ZOK)
    {
        LOG_INFO(log, "ExportPartition: Marked export as completed and persisted commit_info");
        return;
    }
    // fall through to status-only set on ZNODEEXISTS / other errors
}

Coverage summary

Item Detail
Scope reviewed ZK state machine around export task completion (processed/*, status, new commit_info), commit idempotency behavior for Iceberg and plain object storage, in-memory mirror → system.replicated_partition_exports, and integration tests added/updated in the patch.
Categories failed Exception-safety / partial-update (new JSON serialization on completion critical path).
Categories passed State-transition consistency (status + commit_info via tryMulti with status-only fallback), idempotency signaling (Iceberg “already committed” sentinel; object-storage marker path surfaced even if pre-existing), concurrency/interleaving (peer-written commit_info handled via ZNODEEXISTS → status-only set), best-effort ZK mirroring semantics (poll-time refresh; no extra ZK reads on system-table query).
Assumptions/limits Audit is based on the PR’s public patch (.patch) and static reasoning (no local build/test execution).

### Commit info columns

These columns surface paths produced by the destination storage during commit, so it is possible to inspect what was written without consulting the destination directly:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description for destination_file_paths column is missing.

/// files and reaching this point, the task still completes via the recovery
/// path but commit_info will be absent. Recovering commit_info from the
/// live Iceberg snapshot in that case is a possible future enhancement.
const std::string status_path = fs::path(entry_path) / "status";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI thinks that exception in the block below can breaks commit transaction.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is totally true. here's what can happen:

once we commit to iceberg, we need to mark it as completed in zookeeper. If the code that creates the commit info throws, we don't mark it as completed in zookeeper and it remains in pending state. In the next scheduler tick, we'll try to commit it again, and we'll notice it has already been committed. In that case, we just mark it as completed. It won't remain in pending forever as far as I can tell

@arthurpassos arthurpassos requested a review from ianton-ru May 29, 2026 17:03
ianton-ru
ianton-ru previously approved these changes May 29, 2026

@ianton-ru ianton-ru left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DimensionWieldr

Copy link
Copy Markdown
Collaborator

@arthurpassos Looks like build is still failing. Maybe check IcebergMetadata.cpp?
Build jobs fail with:

IcebergMetadata.cpp:1815: error: use of undeclared identifier 'storage_metadata_name'
  IcebergMetadata.cpp:1817: error: use of undeclared identifier 'storage_manifest_entry_name'
  IcebergMetadata.cpp:1831: error: use of undeclared identifier 'storage_metadata_name'
  IcebergMetadata.cpp:1833: error: use of undeclared identifier 'storage_manifest_entry_name'

@DimensionWieldr

DimensionWieldr commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Failing integration tests (new fails in PR) are fixed in other PRs. Regression fails unrelated.

Manually tested, no issues. (Auto-tests will be added later.)

AI audit returned no major blockers.

LGTM

@DimensionWieldr DimensionWieldr added the verified Approved for release label Jun 1, 2026
@DimensionWieldr DimensionWieldr removed the verified Approved for release label Jun 2, 2026
@DimensionWieldr

Copy link
Copy Markdown
Collaborator

Following integration tests and one stateless test are failing with recent changes (new fails in this PR).

test_export_replicated_mt_partition_to_iceberg/test.py::test_failure_is_logged_in_system_table
test_export_replicated_mt_partition_to_iceberg/test.py::test_inject_short_living_failures
test_export_replicated_mt_partition_to_iceberg/test.py::test_post_publish_exception_preserves_snapshot
test_quorum_inserts_parallel/test.py::test_parallel_quorum_actually_parallel
test_storage_iceberg_with_spark/test_export_partition_iceberg.py::test_idempotency_after_commit_crash
test_storage_iceberg_with_spark/test_export_partition_iceberg_catalog.py::test_catalog_idempotent_retry
test_export_replicated_mt_partition_to_iceberg/test.py::test_export_task_timeout_kills_stuck_pending_task
test_export_replicated_mt_partition_to_iceberg/test.py::test_failure_is_logged_in_system_table
test_export_replicated_mt_partition_to_iceberg/test.py::test_inject_short_living_failures
test_export_replicated_mt_partition_to_iceberg/test.py::test_post_publish_exception_preserves_snapshot

00157_cache_dictionary

@svb-alt svb-alt added the roadmap Key features and improvements for Antalya project label Jun 7, 2026
@DimensionWieldr

Copy link
Copy Markdown
Collaborator

PR #1832 CI Triage: "Add commit info to partition exports table"

PR: #1832
Title: Add commit info to partition exports table
Branch: expand-replicated-partition-exports-columns
Head SHA: 3e554398c3b8d10b967064e58dc126c51b4dfb01
Author: Arthur Passos (@arthurpassos)

Files Changed

  • src/Storages/System/StorageSystemReplicatedPartitionExports.cpp / .h — new columns: committed_metadata_file, committed_manifest_list, committed_manifest_file, committed_marker_file
  • src/Storages/ExportReplicatedMergeTreePartitionManifest.hExportReplicatedMergeTreePartitionCommitInfoEntry struct and commit_info ZK znode schema
  • src/Storages/ExportReplicatedMergeTreePartitionTaskEntry.h — in-memory commit_info mirror
  • src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp / .h — read/persist commit_info; surface commit paths in getPartitionExportsInfo
  • src/Storages/MergeTree/ExportPartitionUtils.cpp — write commit_info znode atomically with COMPLETED status
  • src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp / .h — return sentinel commit info for already-committed idempotent retries
  • src/Storages/ObjectStorage/StorageObjectStorage.cpp / .hExportPartitionCommitInfo plumbing
  • tests/integration/test_export_replicated_mt_partition_to_iceberg/test.py — assertions on new system table columns
  • tests/integration/test_export_replicated_mt_partition_to_object_storage/test.py — assertions on committed_marker_file, exception_count
  • tests/integration/test_storage_iceberg_with_spark/test_export_partition_iceberg.py — idempotency sentinel assertion
  • tests/integration/test_storage_iceberg_with_spark/test_export_partition_iceberg_catalog.py — catalog idempotency sentinel assertion
  • docs/en/antalya/partition_export.md — documentation for new columns

Scope of change: Adds commit-time observability columns to system.replicated_partition_exports, persisting Iceberg metadata/manifest paths and plain-object-storage commit marker paths in a new commit_info ZooKeeper znode. No unrelated subsystems are modified.


Summary

Category Count Tests
PR-caused regression 9 Partition-export system table columns not populated (see below)
Pre-existing flaky 2 03443_shared_storage_snapshots, test_move_after_processing AzureQueue
Targeted-shard pollution 1 test_quorum_inserts_parallel (table default.r already exists)

Conclusion: 9 of 12 unique failing tests are caused by this PR. The new system.replicated_partition_exports columns (committed_metadata_file, committed_marker_file, exception_count, last_exception_per_replica) are empty/zero in scenarios where the PR's own integration tests expect them to be populated. Not safe to merge until fixed.


Failure-by-Failure Analysis

1. Integration tests (amd_asan, targeted) — 60/130 failures

This shard runs only tests touched by the PR. Failures fall into two groups:

1a. PR partition-export tests (PR-caused) — 50 failures across 7 test functions

All assert that new system.replicated_partition_exports columns are populated after export operations. Every failure shows empty string or zero:

Test Assertion Got
test_failure_is_logged_in_system_table exception_count > 0 0
test_inject_short_living_failures exception_count >= 1 0
test_post_publish_exception_preserves_snapshot committed_metadata_file non-empty *.metadata.json path ''
test_export_task_timeout_kills_stuck_pending_task "timed out" in last_exception_per_replica ''
test_idempotency_after_commit_crash committed_metadata_file == "<committed in a previous run, paths unavailable>" ''
test_catalog_idempotent_retry same sentinel ''
test_export_partition_file_already_exists_policy committed_marker_file contains commit_2020_ ''
  • Verdict: PR-caused regression — the new observability columns are not being surfaced. commit_info is either not persisted to ZK, not read back by ExportPartitionManifestUpdatingTask::poll, or not propagated into getPartitionExportsInfo. The Iceberg idempotency sentinel ("<committed in a previous run, paths unavailable>") is returned by IcebergMetadata::commitImportPartitionTransactionImpl but never appears in the system table.

1b. test_quorum_inserts_parallel/test.py::test_parallel_quorum_actually_parallel — 9 failures (retry variants)

Error:

Code: 57. DB::Exception: Table default.r already exists.
  • Runs in the targeted shard alongside PR tests but is unrelated to partition exports
  • Classic test-isolation failure: leftover table from a prior run in the same CI container
  • Verdict: Targeted-shard pollution / pre-existing flaky — not caused by PR code changes

2. Partition-export integration tests in full integration shards (PR-caused)

The same 7 PR-owned test functions also fail across the broader integration shards:

Shard Failures
Integration tests (amd_asan, db disk, old analyzer, 1/6) 1/665 — test_export_partition_file_already_exists_policy
Integration tests (amd_asan, db disk, old analyzer, 3/6) 3/988 — iceberg: test_failure_is_logged_in_system_table, test_inject_short_living_failures, test_post_publish_exception_preserves_snapshot
Integration tests (amd_asan, db disk, old analyzer, 4/6) 2/1171 — spark: test_idempotency_after_commit_crash, test_catalog_idempotent_retry
Integration tests (amd_binary, 1/5) 3/834 — object storage: test_failure_is_logged_in_system_table, test_inject_short_living_failures, test_export_partition_file_already_exists_policy
Integration tests (amd_binary, 4/5) 4/810 — iceberg: all four failing tests
Integration tests (amd_binary, 5/5) 2/1571 — spark idempotency tests
Integration tests (arm_binary, distributed plan, 1/4) 3/1146 — iceberg: three failing tests
Integration tests (arm_binary, distributed plan, 2/4) 2/1607 — spark idempotency tests

All show the same empty-column pattern. Verdict: PR-caused regression (same root cause as §1a).

Likely fix areas:

  1. ExportPartitionUtils.cpp — verify commit_info znode is created in the COMPLETED multi-op
  2. ExportPartitionManifestUpdatingTask.cpp — verify readCommitInfo is called on the COMPLETED hot path and that getPartitionExportsInfo maps commit_info fields
  3. ExportPartitionManifestUpdatingTask.cpp — verify readLastExceptionPerReplica populates last_exception_per_replica (feeds both last_exception_per_replica column and exception_count)
  4. Iceberg idempotent-retry path — ensure the "<committed in a previous run, paths unavailable>" sentinel from IcebergMetadata.cpp is written into commit_info and surfaced

3. Stateless tests (arm_asan, azure, sequential, 2/2)03443_shared_storage_snapshots FAIL


4. Integration tests (arm_binary, distributed plan, 4/4)test_move_after_processing[another_bucket-AzureQueue] FAIL

Error:

assert 0 == (5 * 10)
  where 0 = int('0\n')
  where '0\n' = query("SELECT sum(rows_processed) FROM system.azure_queue_log
                       WHERE table = 'move_after_processing_unordered_AzureQueue_etwswt'")

Recommended Next Steps

  1. Fix commit_info persistence/surfacing — debug why committed_metadata_file and committed_marker_file remain empty after successful commits and idempotent retries
  2. Fix exception observability — debug why last_exception_per_replica and exception_count stay empty/zero when failpoints inject transient errors
  3. Re-run targeted integration shard after fixes to confirm all 7 PR-owned test functions pass
  4. Ignore for merge decision: 03443_shared_storage_snapshots, AzureQueue test, quorum-inserts parallel test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.3 port-antalya PRs to be ported to all new Antalya releases roadmap Key features and improvements for Antalya project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants