Add commit info to partition exports table#1832
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a9022d641
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| LOG_INFO(log, | ||
| "Export transaction {} already committed, skipping re-commit", | ||
| transaction_id); | ||
| return; | ||
| /// Empty commit info: paths produced by the original commit are not recoverable | ||
| /// here without re-reading the snapshot. ExportPartitionUtils::commit only | ||
| /// writes the commit_info znode when this struct is non-empty, so the original | ||
| /// committer's record (if any) is preserved. | ||
| return {}; |
There was a problem hiding this comment.
Return commit info for already-committed recovery path
When isExportPartitionTransactionAlreadyCommitted is true here, the function returns an empty ExportPartitionCommitInfo, which makes ExportPartitionUtils::commit skip writing <export-entry>/commit_info and only flip status to COMPLETED. In the common recovery scenario (snapshot committed previously, then retried before ZooKeeper was updated), this leaves committed_* fields permanently empty and contradicts the new observability behavior/documentation that expects a visible marker for already-committed retries.
Useful? React with 👍 / 👎.
| assert committed_metadata_file == "<committed in a previous run, paths unavailable>", ( | ||
| f"Expected sentinel in committed_metadata_file for already-committed retry, got: {committed_metadata_file!r}" |
There was a problem hiding this comment.
Fix sentinel expectation in post-publish failpoint test
This assertion is inconsistent with the implementation under iceberg_writes_post_publish_throw: that failpoint is ONCE, and the published catch path now returns real file paths (storage_metadata_name, manifest list, manifest file), so committed_metadata_file should be a metadata path, not the "previous run" sentinel. As written, the test will fail despite correct behavior and can block CI.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1832 (Add commit info to partition exports table): Confirmed defectsMedium: commit_info serialization failure can block COMPLETED transitionImpact: A successful destination commit (Iceberg snapshot or object-storage marker) can remain stuck in Anchor: Trigger: Any exception thrown by Why defect: The new observability side-effect (JSON stringify) is now on the critical path of the state transition; a stringify exception aborts the function before the status update, changing behavior from “commit then mark completed” to “commit then throw”. Fix direction (short): Make commit_info persistence best-effort: wrap Regression test direction (short): Add a failpoint (or targeted fault injection) that throws during commit_info serialization and assert the task still reaches EvidenceSerialization enables stream exceptions: std::string toJsonString() const
{
Poco::JSON::Object json;
json.set("iceberg_metadata_file", iceberg_metadata_file);
// ...
std::ostringstream oss;
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(json, oss);
return oss.str();
}
if (!destination_commit_info.empty())
{
ExportReplicatedMergeTreePartitionCommitInfoEntry commit_info_entry { /* ... */ };
const std::string commit_info_path = fs::path(entry_path) / "commit_info";
Coordination::Requests ops;
ops.emplace_back(zkutil::makeCreateRequest(commit_info_path, commit_info_entry.toJsonString(), zkutil::CreateMode::Persistent));
ops.emplace_back(zkutil::makeSetRequest(status_path, completed_name, -1));
// ...
if (rc == Coordination::Error::ZOK)
{
LOG_INFO(log, "ExportPartition: Marked export as completed and persisted commit_info");
return;
}
// fall through to status-only set on ZNODEEXISTS / other errors
}Coverage summary
|
| ### Commit info columns | ||
|
|
||
| These columns surface paths produced by the destination storage during commit, so it is possible to inspect what was written without consulting the destination directly: | ||
|
|
There was a problem hiding this comment.
Description for destination_file_paths column is missing.
| /// files and reaching this point, the task still completes via the recovery | ||
| /// path but commit_info will be absent. Recovering commit_info from the | ||
| /// live Iceberg snapshot in that case is a possible future enhancement. | ||
| const std::string status_path = fs::path(entry_path) / "status"; |
There was a problem hiding this comment.
AI thinks that exception in the block below can breaks commit transaction.
There was a problem hiding this comment.
I don't think that is totally true. here's what can happen:
once we commit to iceberg, we need to mark it as completed in zookeeper. If the code that creates the commit info throws, we don't mark it as completed in zookeeper and it remains in pending state. In the next scheduler tick, we'll try to commit it again, and we'll notice it has already been committed. In that case, we just mark it as completed. It won't remain in pending forever as far as I can tell
|
@arthurpassos Looks like build is still failing. Maybe check IcebergMetadata.cpp? |
|
Failing integration tests (new fails in PR) are fixed in other PRs. Regression fails unrelated. Manually tested, no issues. (Auto-tests will be added later.) AI audit returned no major blockers. LGTM |
|
Following integration tests and one stateless test are failing with recent changes (new fails in this PR). |
PR #1832 CI Triage: "Add commit info to partition exports table"PR: #1832 Files Changed
Scope of change: Adds commit-time observability columns to Summary
Conclusion: 9 of 12 unique failing tests are caused by this PR. The new Failure-by-Failure Analysis1.
|
| Test | Assertion | Got |
|---|---|---|
test_failure_is_logged_in_system_table |
exception_count > 0 |
0 |
test_inject_short_living_failures |
exception_count >= 1 |
0 |
test_post_publish_exception_preserves_snapshot |
committed_metadata_file non-empty *.metadata.json path |
'' |
test_export_task_timeout_kills_stuck_pending_task |
"timed out" in last_exception_per_replica |
'' |
test_idempotency_after_commit_crash |
committed_metadata_file == "<committed in a previous run, paths unavailable>" |
'' |
test_catalog_idempotent_retry |
same sentinel | '' |
test_export_partition_file_already_exists_policy |
committed_marker_file contains commit_2020_ |
'' |
- Verdict: PR-caused regression — the new observability columns are not being surfaced.
commit_infois either not persisted to ZK, not read back byExportPartitionManifestUpdatingTask::poll, or not propagated intogetPartitionExportsInfo. The Iceberg idempotency sentinel ("<committed in a previous run, paths unavailable>") is returned byIcebergMetadata::commitImportPartitionTransactionImplbut never appears in the system table.
1b. test_quorum_inserts_parallel/test.py::test_parallel_quorum_actually_parallel — 9 failures (retry variants)
Error:
Code: 57. DB::Exception: Table default.r already exists.
- Runs in the targeted shard alongside PR tests but is unrelated to partition exports
- Classic test-isolation failure: leftover table from a prior run in the same CI container
- Verdict: Targeted-shard pollution / pre-existing flaky — not caused by PR code changes
2. Partition-export integration tests in full integration shards (PR-caused)
The same 7 PR-owned test functions also fail across the broader integration shards:
| Shard | Failures |
|---|---|
Integration tests (amd_asan, db disk, old analyzer, 1/6) |
1/665 — test_export_partition_file_already_exists_policy |
Integration tests (amd_asan, db disk, old analyzer, 3/6) |
3/988 — iceberg: test_failure_is_logged_in_system_table, test_inject_short_living_failures, test_post_publish_exception_preserves_snapshot |
Integration tests (amd_asan, db disk, old analyzer, 4/6) |
2/1171 — spark: test_idempotency_after_commit_crash, test_catalog_idempotent_retry |
Integration tests (amd_binary, 1/5) |
3/834 — object storage: test_failure_is_logged_in_system_table, test_inject_short_living_failures, test_export_partition_file_already_exists_policy |
Integration tests (amd_binary, 4/5) |
4/810 — iceberg: all four failing tests |
Integration tests (amd_binary, 5/5) |
2/1571 — spark idempotency tests |
Integration tests (arm_binary, distributed plan, 1/4) |
3/1146 — iceberg: three failing tests |
Integration tests (arm_binary, distributed plan, 2/4) |
2/1607 — spark idempotency tests |
All show the same empty-column pattern. Verdict: PR-caused regression (same root cause as §1a).
Likely fix areas:
ExportPartitionUtils.cpp— verifycommit_infoznode is created in the COMPLETED multi-opExportPartitionManifestUpdatingTask.cpp— verifyreadCommitInfois called on the COMPLETED hot path and thatgetPartitionExportsInfomapscommit_infofieldsExportPartitionManifestUpdatingTask.cpp— verifyreadLastExceptionPerReplicapopulateslast_exception_per_replica(feeds bothlast_exception_per_replicacolumn andexception_count)- Iceberg idempotent-retry path — ensure the
"<committed in a previous run, paths unavailable>"sentinel fromIcebergMetadata.cppis written intocommit_infoand surfaced
3. Stateless tests (arm_asan, azure, sequential, 2/2) — 03443_shared_storage_snapshots FAIL
- Report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1832&sha=3e554398c3b8d10b967064e58dc126c51b4dfb01&name_0=PR&name_1=Stateless%20tests%20%28arm_asan%2C%20azure%2C%20sequential%2C%202%2F2%29
- Result mismatch (not timeout): expected
With snapshot sharing disabled: 0, gotAll iterations has been failed. Unable to test snapshot sharing disabled. Last result: 1 - Shared storage snapshots test — no relation to partition exports or
system.replicated_partition_exports - Same failure pattern observed in PR Fix cluster functions with hive partitioning #1863 and other unrelated PRs
- Verdict: Pre-existing flaky test — timing-sensitive shared-storage-snapshot test
4. Integration tests (arm_binary, distributed plan, 4/4) — test_move_after_processing[another_bucket-AzureQueue] FAIL
Error:
assert 0 == (5 * 10)
where 0 = int('0\n')
where '0\n' = query("SELECT sum(rows_processed) FROM system.azure_queue_log
WHERE table = 'move_after_processing_unordered_AzureQueue_etwswt'")- AzureQueue log shows 0 rows processed — queue message delivery/timing issue
- No relation to partition exports
- Identical failure in PR Fix cluster functions with hive partitioning #1863
- Verdict: Pre-existing flaky test — AzureQueue processing timing
Recommended Next Steps
- Fix commit_info persistence/surfacing — debug why
committed_metadata_fileandcommitted_marker_fileremain empty after successful commits and idempotent retries - Fix exception observability — debug why
last_exception_per_replicaandexception_countstay empty/zero when failpoints inject transient errors - Re-run targeted integration shard after fixes to confirm all 7 PR-owned test functions pass
- Ignore for merge decision:
03443_shared_storage_snapshots, AzureQueue test, quorum-inserts parallel test
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add Iceberg metadata file paths and data file paths to the
system.replicated_partition_exportsto improve observability and debugging. Closes #1832Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: