Skip to content

Cherry-pick: Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP#1896

Open
zvonand wants to merge 9 commits into
antalya-26.3from
releasy/port/pr-98670-4e7968
Open

Cherry-pick: Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP#1896
zvonand wants to merge 9 commits into
antalya-26.3from
releasy/port/pr-98670-4e7968

Conversation

@zvonand

@zvonand zvonand commented Jun 8, 2026

Copy link
Copy Markdown
Member

Cherry-picked from ClickHouse#98670.

Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP


Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Supports CREATE TABLE, CREATE TABLE … AS source, and DROP TABLE for DataLakeCatalog; optional catalog-side data purge can be triggered ( new database_iceberg_purge_on_drop setting).

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Workflow [PR], commit [e9faf9c]

@zvonand zvonand added antalya-26.3 port-antalya PRs to be ported to all new Antalya releases labels Jun 8, 2026
@alsugiliazova alsugiliazova linked an issue Jun 9, 2026 that may be closed by this pull request
@alsugiliazova

Copy link
Copy Markdown
Member

#1897

@alsugiliazova

Copy link
Copy Markdown
Member

#1898

@alsugiliazova alsugiliazova added the verified-with-issues Verified by QA and issues found. label Jun 11, 2026
zvonand and others added 2 commits June 12, 2026 01:33
- Port \`IcebergPathResolver::reverseResolve\` from upstream (predates the
  PR upstream but was missing on this branch).
- Handle the Altinity-only \`S3_TABLES\` catalog type in
  \`getLocationSchemeForTableCreation\` (it is always S3-backed).
- Drop the \`unique_key\` check in \`DatabaseDataLake::createTable\`:
  \`ASTStorage\` has no \`UNIQUE KEY\` clause on this branch.
- \`getInMemoryMetadataPtr\` does not take a context argument on this
  branch.

ClickHouse#98670

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@zvonand zvonand force-pushed the releasy/port/pr-98670-4e7968 branch from 3983fab to 3ac9ae4 Compare June 11, 2026 23:48
@alsugiliazova

Copy link
Copy Markdown
Member

AI audit note: This review comment was generated by AI (claude-4.6-opus).

Audit update for PR #1896 (Native CREATE TABLE / DROP TABLE for DataLakeCatalog)

Confirmed defects

Medium: GlueCatalog orphans metadata file on CreateTable API failure

  • Impact: Stale v1.metadata.json left on S3 if Glue CreateTable call fails after the file was written.
  • Anchor: src/Databases/DataLake/GlueCatalog.cpp / createTable (the new write-then-register block)
  • Trigger: GlueCatalog::createTable writes metadata to S3 successfully, but the subsequent glue_client->CreateTable(request) returns an error (e.g. AlreadyExistsException, permissions, network timeout).
  • Why defect: The S3 write is not rolled back on Glue API failure; repeated retries or re-creates at the same location may find the orphaned file and misinterpret it.
  • Fix direction: Wrap the Glue API call in try/catch and removeObjectIfExists the written metadata file on failure (mirroring the cleanup pattern added in Utils.cpp).
  • Regression test direction: Mock Glue CreateTable to fail, verify no metadata file remains on S3.

Medium: reverseResolve produces malformed path when storage_path does not start with table_root

  • Impact: Corrupted metadata-log entry in Iceberg metadata JSON, potentially causing metadata reads to fail on subsequent operations.
  • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergPath.h / reverseResolve
  • Trigger: storage_path is an absolute/full URI (e.g. starts with s3://bucket/...) while table_root is a relative path — the starts_with check fails and the fallback concatenates table_location + storage_path producing a doubled URI.
  • Why defect: The else-branch blindly prepends table_location without checking if storage_path is already absolute, which can produce s3://bucket/table/s3://bucket/table/metadata/v1.metadata.json.
  • Fix direction: Add a check: if storage_path already starts with table_location, return it as-is without prepending.
  • Regression test direction: Unit test reverseResolve with an absolute URI storage_path that doesn't match table_root.

Low: S3Tables catalog silently accepts the CREATE TABLE location-resolution path but throws a confusing error later

  • Impact: User gets "createTable is not implemented" from the catalog layer rather than a clear rejection that S3Tables doesn't support CREATE TABLE from ClickHouse.
  • Anchor: src/Databases/DataLake/DatabaseDataLake.cpp / getLocationSchemeForTableCreation (S3_TABLES case returns "s3") + ICatalog::createTable base throwing NOT_IMPLEMENTED
  • Trigger: User runs CREATE TABLE on a database backed by S3Tables catalog type.
  • Why defect: getLocationSchemeForTableCreation succeeds for S3_TABLES, location is constructed, metadata generated, then catalog->createTable throws a generic NOT_IMPLEMENTED error rather than an early, specific rejection.
  • Fix direction: Either add S3_TABLES to the "throw BAD_ARGUMENTS" cases in getLocationSchemeForTableCreation, or override createTable in S3TablesCatalog with a clear message.
  • Regression test direction: Integration test: attempt CREATE TABLE on an S3Tables database and assert a user-friendly error.

Coverage summary

Category Status
Scope reviewed InterpreterCreateQuery engine-less path, InterpreterDropQuery ON CLUSTER check, InterpreterAlterQuery ON CLUSTER check, DatabaseDataLake::createTable, DatabaseDataLake::dropTable, GlueCatalog::createTable/dropTable, RestCatalog::createTable/dropTable/namespaceToJSONArray/encodeNamespaceForURI, S3TablesCatalog::dropTable, ICatalog::storageTypeToScheme, constructTableLocation, IcebergMetadata::createInitial metadata filename, MetadataGenerator::generateNextMetadata parent_snapshot/metadata-log changes, IcebergWrites previous_metadata_file_path tracking, Mutations.cpp reverseResolve + writeMetadataFiles, Utils.cpp cleanup-on-failure, IcebergPath::reverseResolve, StorageObjectStorage::alter/checkAlterIsPossible/drop
Categories failed Partial-update rollback (GlueCatalog metadata write), path resolution edge case (reverseResolve)
Categories passed Concurrency/thread-safety, memory lifetime, exception safety (DDL guard RAII), integer overflow (N/A), ON CLUSTER rejection, namespace encoding, setting rename/alias, DROP purge semantics, metadata-log correctness, metadata filename UUID, test coverage
Assumptions/limits Static analysis only; GlueCatalog AWS interactions not runtime-tested; Mutations.cpp retry logic assumed correct based on pre-existing patterns

@DimensionWieldr

Copy link
Copy Markdown
Collaborator

There's an export partition regression fail that looks like it's caused by this PR.

Under ice catalog, exporting multiple partitions in one ALTER does some concurrent writes. Exports are marked as COMPLETED but DataLakeCatalog read-back is short on rows. The same scenarios pass on baseline Antalya and under no_catalog (direct S3). The regression lines up with the PR’s Iceberg commit changes in IcebergWrites.cpp / MetadataGenerator.cpp: when one ALTER fires several parallel exports to the same REST-catalog table, later commits can land without chaining onto earlier snapshots’ manifests, so only the latest partition’s data is visible even though every export succeeded.

@zvonand

zvonand commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

@DimensionWieldr how it could've happened? this PR is not yet merged, and the latest run has not yet finished build jobs (there are no regression results yet)

@DimensionWieldr

DimensionWieldr commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@DimensionWieldr how it could've happened? this PR is not yet merged, and the latest run has not yet finished build jobs (there are no regression results yet)

Sorry I should've specified. This is a fail from CI on this PR, from the previous commit. I was asked to take a look at export partition fails earlier today. Maybe the new CI run for the most recent commit will look better.

@alsugiliazova

Copy link
Copy Markdown
Member

AI audit note: This review comment was generated by AI (claude-4.6-opus).

Audit update for PR #1896 — commit 7a5ddbf (Native CREATE TABLE / DROP TABLE for DataLakeCatalog)

Confirmed defects

Medium: Missing null-safety for current-snapshot-id in IcebergStorageSink::initializeMetadata

  • Impact: Exception/crash when the first INSERT targets a table whose initial metadata has "current-snapshot-id": null (Iceberg v2 convention for tables created by external tools like Spark/PyIceberg).
  • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp line 1207
  • Trigger: Insert into a table created externally with Iceberg v2 metadata where current-snapshot-id is JSON null rather than -1.
  • Why defect: Mutations.cpp:379 was correctly updated to include !metadata->isNull(Iceberg::f_current_snapshot_id), but the same check in IcebergWrites.cpp was not updated — getValue<Int64> on a null Poco JSON value throws.
  • Fix direction: Add && !metadata->isNull(Iceberg::f_current_snapshot_id) to the guard at line 1207 of IcebergWrites.cpp.
  • Regression test direction: Create a table externally with "current-snapshot-id": null, then INSERT from ClickHouse.
  • Note: This is pre-existing on the base branch, but the PR fixed it in Mutations.cpp and should fix it here too for consistency.

Low: S3Tables catalog silently accepts the CREATE TABLE location-resolution path but throws a confusing error later

  • Impact: User gets "createTable is not implemented" from the catalog layer rather than a clear rejection that S3Tables doesn't support CREATE TABLE from ClickHouse.
  • Anchor: src/Databases/DataLake/DatabaseDataLake.cpp / getLocationSchemeForTableCreation (S3_TABLES case returns "s3") + ICatalog::createTable base throwing NOT_IMPLEMENTED
  • Trigger: User runs CREATE TABLE on a database backed by S3Tables catalog type.
  • Why defect: getLocationSchemeForTableCreation succeeds for S3_TABLES, location is constructed, metadata generated, then catalog->createTable throws a generic NOT_IMPLEMENTED error rather than an early, specific rejection.
  • Fix direction: Either add S3_TABLES to the "throw BAD_ARGUMENTS" cases in getLocationSchemeForTableCreation, or override createTable in S3TablesCatalog with a clear message.
  • Regression test direction: Attempt CREATE TABLE on an S3Tables database and assert a user-friendly error.

Resolved from previous audit (fixed in commit 7a5ddbf)

  1. GlueCatalog orphans metadata file on CreateTable API failure — Fixed with SCOPE_EXIT + registered flag. Metadata file is now cleaned up if the Glue API call fails.
  2. reverseResolve produces malformed path — Fixed by adding a table_location prefix check before the fallback path.
  3. IcebergMetadata::createInitial partial-update — Additionally fixed: SCOPE_EXIT now cleans up both the metadata file and version-hint if catalog registration fails.

Regarding the reported export partition regression (DimensionWieldr's comment)

Traced the parallel-export flow through IcebergStorageSink::initializeMetadatagenerateNextMetadatagenerateManifestList:

  • use_previous_snapshots defaults to true in the write path.
  • The new guard new_snapshot->has(Iceberg::f_parent_snapshot_id) only blocks manifest chaining for initial snapshots (parent_snapshot == -1). For exports on tables with existing data, parent IS set and previous manifests ARE correctly included.
  • On CAS/version-hint conflict, the retry re-reads metadata (previous_metadata_file_path updated, metadata object re-read), then parent_snapshot correctly points to the winning snapshot whose manifest list already includes all prior data.
  • The Utils.cpp cleanup (removing stale metadata files on conflict) prevents the resolver from picking uncommitted metadata as latest.

Conclusion: The code changes in MetadataGenerator.cpp and IcebergWrites.cpp do not introduce a mechanism that would lose manifest entries from concurrent exports. The regression may be pre-existing or related to a different code path not modified by this PR.

Coverage summary

Category Status
Scope reviewed All 36 changed files; focused re-analysis on IcebergWrites.cpp retry/concurrency path, MetadataGenerator.cpp parent-snapshot logic, generateManifestList chaining, GlueCatalog cleanup, IcebergMetadata::createInitial cleanup, reverseResolve fix
Categories failed Null-safety (IcebergWrites.cpp current-snapshot-id), UX (S3Tables CREATE TABLE error message)
Categories passed Partial-update rollback (fixed), path resolution (fixed), concurrency/manifest-chaining (correct), memory lifetime, exception safety, ON CLUSTER rejection, namespace encoding, setting rename/alias, DROP purge semantics, metadata-log correctness, test coverage
Assumptions/limits Static analysis only; export partition regression analyzed by code tracing only, not runtime-reproduced; Poco JSON isNull vs has semantics assumed standard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.3 port-antalya PRs to be ported to all new Antalya releases verified-with-issues Verified by QA and issues found.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong metadata json when creating iceberg table from clickhouse directly Creating Iceberg table with decimal column is not supported

4 participants