Skip to content

<fix>[storage]: support ceph multi image cache pool#4174

Open
zstack-robot-2 wants to merge 1 commit into
4.8.0from
sync/shan.wu/support-multi-imagecache-pool@@2
Open

<fix>[storage]: support ceph multi image cache pool#4174
zstack-robot-2 wants to merge 1 commit into
4.8.0from
sync/shan.wu/support-multi-imagecache-pool@@2

Conversation

@zstack-robot-2

Copy link
Copy Markdown
Collaborator

支持 Ceph 多 ImageCache pool 策略,避免目标 root pool 已作为 ImageCache pool 时继续依赖默认 ImageCache pool 触发跨 pool clone。

验证要求:仅在 root@172.20.13.237:/root/zstack-workspace/zstack/ 执行编译和改动 case。

sync from gitlab !10106

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

该 PR 为 Ceph 主存储引入镜像缓存池选择策略(Default/PreferVolumePool/PreferExistingCache),实现按策略选池、bits 校验与失效记录清理,支持跨池或从 Ceph 备份存储拷贝重建缓存,增加相关消息/配置校验与大量集成测试。

Changes

Ceph 镜像缓存池策略与选择

Layer / File(s) Summary
消息契约和配置验证
header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java, sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
新增 DownloadVolumeTemplateToPrimaryStorageMsg.targetVolumeInstallUrl 字段与 getter/setter;新增 CephImageCachePoolStrategy 枚举;为 Ceph 全局配置、API 与 SDK 参数添加允许值校验(含 ImageCache/三策略值)。
缓存池选择与位检测逻辑
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
新增从 ceph:// 解析 pool 的选池工具、selectImageCache 与 bits 校验/失效记录删除流程;支持在目标池缺失时枚举其它池查找 sourceCache。
拷贝/下载执行与签名约束
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
重构 DownloadToCache.doDownload/prepareInSelectedPool:支持指定目标池与 sourceCache,容量预分配约束为目标 pool,新增 copyFromImageCache 与 copyFromCephBackupStorage 路径,并在下载任务签名中包含目标池名。
缓存删除与清理收窄
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java, plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
新增基于 installPath 的 deleteImageCacheOnPrimaryStorage 重载(保留 snapshotPath),并收窄 createShadowImageCacheVOs 的查询到 installUrl like 'ceph://%',为空时提前返回。
测试基础设施与集成用例
testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy, test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
CephPrimaryStoragePoolSpec 增加 isCreate 可控参数;集成测试新增 ImageCache/root-only pool、扩展备份镜像源,并增加大量验证用例覆盖三种策略、缺失 bits 的跨池复制、缓存清理、reimage 与自定义磁盘用例。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

我是兔子轻跳场,池间寻路把门闯,
bits 若散去我去抄,跨池拷贝接新光,
策略三选稳又忙,删除讯息也跟上,
镜像回家路径亮堂。 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR 标题清晰准确地概括了主要变更:支持 Ceph 多 ImageCache pool 的功能,与所有变更内容高度相关。
Description check ✅ Passed PR 描述与变更内容相关,说明了支持多 ImageCache pool 策略的目的和同步来源。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/shan.wu/support-multi-imagecache-pool@@2

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 5406-5416: This bypass in the reinit path calls
removeImageCacheRecord(...) outside the DownloadToCache serialized image/pool
flow (selectImageCache -> checkImageCacheBits -> removeImageCacheRecord),
risking double release via releaseAvailableCapacity(); either remove this direct
call and let stale-cache cleanup stay inside
DownloadToCache.prepareInSelectedPool()/DownloadToCache.download() (so reinit
uses the same serialized path), or make the deletion idempotent: perform a
conditional delete by primary key and only call releaseAvailableCapacity() when
the delete actually removed a row (check rows-affected) to avoid
double-counting; update the checkImageCacheBits -> success branch accordingly.
- Around line 2175-2179: The DefaultImageCachePool branch in
CephPrimaryStorageBase currently sets selection.selectedCache = caches.get(0)
which can pick a cache from a non-default pool; change the logic in that branch
(where listImageCaches(imageUuid) is used and selection.selectedPoolName is set
via getDefaultImageCachePoolName()) to filter the returned caches for entries
whose pool name equals getDefaultImageCachePoolName(), then set
selection.selectedCache to that filtered result's first element (or null if
none) so only a cache from the default image cache pool is chosen.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 1119-1120: The code filters vm2.allVolumes by comparing the whole
VolumeInventory object to vm2.rootVolumeUuid (string), which always evaluates
true and causes the root volume to be deleted; change the deletion to iterate
only data volumes or compare the volume's uuid field. For example, replace the
vm2.allVolumes.stream().filter {...}.forEach(...) block with a loop over
vm2.dataVolumeUuids calling deleteVolume(uuid) or, if you must use allVolumes,
filter with it.uuid != vm2.rootVolumeUuid so only non-root data volumes are
deleted; ensure you call deleteVolume with the volume UUID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a46c175c-4713-4568-ac6c-766cc4d298a0

📥 Commits

Reviewing files that changed from the base of the PR and between 12bb2b7 and bc941be.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch 2 times, most recently from e8bd978 to ee05508 Compare June 8, 2026 05:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 2204-2208: removeImageCacheRecord currently always calls
osdHelper.releaseAvailableCapacity after issuing deletions, causing
double-release when concurrent cleaners target the same stale cache; change the
deletion to be conditional/atomic and only call releaseAvailableCapacity when
the DB delete actually removed the cache. Specifically, in
removeImageCacheRecord (and the other similar call sites that use
SQL.New(ImageCacheVolumeRefVO.class)...delete() and dbf.remove(cache)), replace
the unconditional dbf.remove/delete sequence with an atomic delete that returns
an affected-row count (or use an optimistic check like delete where id = ? and
version = ?), and only invoke
osdHelper.releaseAvailableCapacity(cache.getInstallUrl(), cache.getSize()) when
the delete count == 1 (i.e. this caller performed the real deletion); apply the
same change to the other identical blocks that use ImageCacheVolumeRefVO delete
+ dbf.remove to make removal idempotent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 69068e55-4d62-4c96-98e4-ba38582bb7a3

📥 Commits

Reviewing files that changed from the base of the PR and between bc941be and e8bd978.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
✅ Files skipped from review due to trivial changes (1)
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy (1)

1122-1122: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

复用 dataVolumeUuids 变量,避免根卷被误删。

Line 1122 的 filter 闭包参数名为 uuid,但实际接收的是 VolumeInventory 对象(因为在 allVolumes 上执行 stream)。条件 uuid != vm2.rootVolumeUuid 实际上是将 VolumeInventory 对象与 String 比较,这个比较恒为 true。结果是根卷也会进入 deleteVolume(it.uuid),而 Line 1121 已经通过 deleteVm(vm2.uuid) 删除了 VM(包括根卷),可能导致重复删除问题。

建议直接使用 Line 1118 已收集的 dataVolumeUuids

建议修改
         deleteVm(vm2.uuid)
-        vm2.allVolumes.stream().filter { uuid -> uuid != vm2.rootVolumeUuid }.forEach { it -> deleteVolume(it.uuid) }
+        dataVolumeUuids.forEach { volumeUuid -> deleteVolume(volumeUuid) }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`
at line 1122, The current stream/filter over vm2.allVolumes uses a parameter
named uuid but actually iterates VolumeInventory objects, so the comparison
"uuid != vm2.rootVolumeUuid" is always true and causes the root volume to be
deleted again; replace that deletion logic to iterate over the already-collected
dataVolumeUuids (from the earlier variable dataVolumeUuids) and call
deleteVolume(uuid) for each, ensuring the root volume removed by
deleteVm(vm2.uuid) is not deleted twice; update the code around vm2, allVolumes,
rootVolumeUuid, deleteVolume, dataVolumeUuids and deleteVm accordingly.
🧹 Nitpick comments (1)
test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy (1)

625-757: ⚖️ Poor tradeoff

建议拆分过长的测试方法。

该测试方法长达 132 行,远超一屏显示范围,降低了代码可读性和可维护性。根据编码规范"单个 if 代码块不宜超过一屏显示",建议将复杂逻辑拆分为独立的辅助方法:

  • 提取陈旧缓存准备逻辑(lines 639-700)为 prepareStaleAndNonDefaultCache()
  • 提取模拟器配置逻辑(lines 701-712)为 configureCheckBitsSimulatorForStaleCache()
  • 保持测试主流程简洁,专注于场景验证
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`
around lines 625 - 757, The test method
testPreferExistingCacheStrategySkipsStaleDefaultPoolCache is too long—extract
the stale-cache setup and simulator setup into helpers: move the logic that
finds/creates staleDefaultCache, nonDefaultCache, creates staleRef and persists
them into a new private helper prepareStaleAndNonDefaultCache() and move the
env.simulator / env.afterSimulator / env.preSimulator configuration into a new
private helper configureCheckBitsSimulatorForStaleCache(); then replace the
inlined blocks in testPreferExistingCacheStrategySkipsStaleDefaultPoolCache with
calls to prepareStaleAndNonDefaultCache() and
configureCheckBitsSimulatorForStaleCache(), keeping all referenced variables
(primaryStorage, image2, NEW_ROOT_POOL_NAME, defaultImageCachePoolName, vm, env,
cloneCmd) accessible or returned as needed.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Line 1122: The current stream/filter over vm2.allVolumes uses a parameter
named uuid but actually iterates VolumeInventory objects, so the comparison
"uuid != vm2.rootVolumeUuid" is always true and causes the root volume to be
deleted again; replace that deletion logic to iterate over the already-collected
dataVolumeUuids (from the earlier variable dataVolumeUuids) and call
deleteVolume(uuid) for each, ensuring the root volume removed by
deleteVm(vm2.uuid) is not deleted twice; update the code around vm2, allVolumes,
rootVolumeUuid, deleteVolume, dataVolumeUuids and deleteVm accordingly.

---

Nitpick comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 625-757: The test method
testPreferExistingCacheStrategySkipsStaleDefaultPoolCache is too long—extract
the stale-cache setup and simulator setup into helpers: move the logic that
finds/creates staleDefaultCache, nonDefaultCache, creates staleRef and persists
them into a new private helper prepareStaleAndNonDefaultCache() and move the
env.simulator / env.afterSimulator / env.preSimulator configuration into a new
private helper configureCheckBitsSimulatorForStaleCache(); then replace the
inlined blocks in testPreferExistingCacheStrategySkipsStaleDefaultPoolCache with
calls to prepareStaleAndNonDefaultCache() and
configureCheckBitsSimulatorForStaleCache(), keeping all referenced variables
(primaryStorage, image2, NEW_ROOT_POOL_NAME, defaultImageCachePoolName, vm, env,
cloneCmd) accessible or returned as needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c7c3cddb-cbbd-416a-b398-726b560bf90d

📥 Commits

Reviewing files that changed from the base of the PR and between e8bd978 and ee05508.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
✅ Files skipped from review due to trivial changes (1)
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch 4 times, most recently from 0d731e7 to d4b49bd Compare June 8, 2026 06:36

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy (1)

1088-1092: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

这里的过滤条件仍然恒为 true

Line 1092 的闭包参数实际是 VolumeInventory,不是 UUID 字符串,所以 uuid != vm2.rootVolumeUuid 会一直成立,根卷也会被送进 deleteVolume()。前一行已经 deleteVm(vm2.uuid) 了,这里直接复用 dataVolumeUuids 即可。

建议修改
         deleteVm(vm2.uuid)
-        vm2.allVolumes.stream().filter { uuid -> uuid != vm2.rootVolumeUuid }.forEach { it -> deleteVolume(it.uuid) }
+        dataVolumeUuids.forEach { volumeUuid -> deleteVolume(volumeUuid) }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`
around lines 1088 - 1092, The filter in the final stream is comparing the
closure parameter (a VolumeInventory object) to vm2.rootVolumeUuid, so the
condition is always true and the root volume may be deleted; change the deletion
to use the previously computed dataVolumeUuids (from dataVolumeUuids) or adjust
the closure to compare the object's uuid property (it.uuid !=
vm2.rootVolumeUuid) so only non-root volumes are passed to deleteVolume; locate
this in the block around vm2.allVolumes, deleteVm(vm2.uuid) and
deleteVolume(...) and update the filter to use dataVolumeUuids or it.uuid for
correct filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 955-992: The test sets CephGlobalConfig.IMAGE_CACHE_POOL_STRATEGY
to PreferVolumePool before creating/stopping the VM but only enters the
try/finally around reimageVmInstance, so failures in
createVmInstance/stopVmInstance leak the global config and test VM; move the try
immediately after updating IMAGE_CACHE_POOL_STRATEGY and include restoring the
config in finally, plus ensure the test VM is deleted (or stopped/removed) in
that finally (reference IMAGE_CACHE_POOL_STRATEGY, createVmInstance,
stopVmInstance, reimageVmInstance, clean()/env.delete()) so the strategy is
always reverted and the VM cleaned up even on intermediate failures.
- Around line 420-447: The test is asserting against the default root pool
instead of the default image-cache pool; change it to fetch the default image
cache pool token (use
CephSystemTags.DEFAULT_CEPH_PRIMARY_STORAGE_IMAGE_CACHE_POOL.getTokenByResourceUuid(primaryStorage.uuid,
CephSystemTags.DEFAULT_CEPH_PRIMARY_STORAGE_IMAGE_CACHE_POOL_TOKEN)) into a
variable like defaultImageCachePoolName and assert
cloneCmd.srcPath.contains(defaultImageCachePoolName) (and remove/replace the
current defaultRootPoolName usage); this aligns the test with
CephPrimaryStorageBase.selectImageCache() which picks the default image cache
pool.

---

Duplicate comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 1088-1092: The filter in the final stream is comparing the closure
parameter (a VolumeInventory object) to vm2.rootVolumeUuid, so the condition is
always true and the root volume may be deleted; change the deletion to use the
previously computed dataVolumeUuids (from dataVolumeUuids) or adjust the closure
to compare the object's uuid property (it.uuid != vm2.rootVolumeUuid) so only
non-root volumes are passed to deleteVolume; locate this in the block around
vm2.allVolumes, deleteVm(vm2.uuid) and deleteVolume(...) and update the filter
to use dataVolumeUuids or it.uuid for correct filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2f6883db-3c8b-43e9-8926-fc9d4f55708a

📥 Commits

Reviewing files that changed from the base of the PR and between ee05508 and 0d731e7.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
✅ Files skipped from review due to trivial changes (1)
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
🚧 Files skipped from review as they are similar to previous changes (7)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch 4 times, most recently from 277df65 to e057ddb Compare June 8, 2026 07:31

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy (2)

1006-1043: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

try/finally 仍然包得太晚,且 new_root_pool_vm 没有回收。

IMAGE_CACHE_POOL_STRATEGY 在 Line 1008 就被改成了 PreferVolumePool,但 try/finally 从 Line 1031 才开始;createVmInstance()stopVmInstance() 失败时会把全局配置泄漏到后续用例。与此同时,这个方法在成功路径下也没有销毁 new_root_pool_vm

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`
around lines 1006 - 1043, Move the
CephGlobalConfig.IMAGE_CACHE_POOL_STRATEGY.updateValue call so you set the
strategy immediately before the operations and wrap all subsequent work
(createVmInstance, stopVmInstance, reimageVmInstance) in a try/finally that
restores the config; in the finally also ensure the created VM new_root_pool_vm
is cleaned up (call the project's VM deletion API used elsewhere, e.g.
destroyVmInstance/deleteVmInstance with new_root_pool_vm.uuid) so the global
config is always restored and the VM is always removed even if
createVmInstance/stopVmInstance/reimageVmInstance fails.

1139-1143: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

这里的过滤条件仍然在拿卷对象和根卷 UUID 比较。

Line 1143 的 filter 输入是 VolumeInventory,不是 String,所以条件会恒为真,根卷也会进入 deleteVolume()。这里直接复用 Line 1139 已经算好的 dataVolumeUuids 就可以了。

最小修改
-        vm2.allVolumes.stream().filter { uuid -> uuid != vm2.rootVolumeUuid }.forEach { it -> deleteVolume(it.uuid) }
+        dataVolumeUuids.forEach { volumeUuid -> deleteVolume(volumeUuid) }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`
around lines 1139 - 1143, The filter in the delete loop is comparing
VolumeInventory objects to vm2.rootVolumeUuid, causing the root volume to slip
through; replace the stream over vm2.allVolumes with the already computed
dataVolumeUuids from the earlier line and iterate those UUIDs when calling
deleteVolume so only non-root data volumes are deleted (use
dataVolumeUuids.forEach { uuid -> deleteVolume(uuid) } and keep
deleteVm(vm2.uuid) as-is).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 373-423: The temporary VM created in
testPreferVolumePoolImageCacheStrategy (imageCachePoolVm from createVmInstance)
is destroyed/expunged inside the try block and can leak if assertions fail; move
the destroyVmInstance and expungeVmInstance calls into the finally (or ensure
they run in the same finally that resets
CephGlobalConfig.IMAGE_CACHE_POOL_STRATEGY), declare imageCachePoolVm outside
the try, and guard the cleanup with a null check on imageCachePoolVm before
calling destroyVmInstance/expungeVmInstance; apply the same pattern to the other
test cases flagged (the other similar createVmInstance blocks).

---

Duplicate comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 1006-1043: Move the
CephGlobalConfig.IMAGE_CACHE_POOL_STRATEGY.updateValue call so you set the
strategy immediately before the operations and wrap all subsequent work
(createVmInstance, stopVmInstance, reimageVmInstance) in a try/finally that
restores the config; in the finally also ensure the created VM new_root_pool_vm
is cleaned up (call the project's VM deletion API used elsewhere, e.g.
destroyVmInstance/deleteVmInstance with new_root_pool_vm.uuid) so the global
config is always restored and the VM is always removed even if
createVmInstance/stopVmInstance/reimageVmInstance fails.
- Around line 1139-1143: The filter in the delete loop is comparing
VolumeInventory objects to vm2.rootVolumeUuid, causing the root volume to slip
through; replace the stream over vm2.allVolumes with the already computed
dataVolumeUuids from the earlier line and iterate those UUIDs when calling
deleteVolume so only non-root data volumes are deleted (use
dataVolumeUuids.forEach { uuid -> deleteVolume(uuid) } and keep
deleteVm(vm2.uuid) as-is).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5299aeb3-e89b-4ac7-b113-34108450e965

📥 Commits

Reviewing files that changed from the base of the PR and between 0d731e7 and cc1cff0.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
✅ Files skipped from review due to trivial changes (1)
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
🚧 Files skipped from review as they are similar to previous changes (7)
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch 11 times, most recently from 419a01a to 819bbd0 Compare June 9, 2026 06:49
}

sql = "select c from ImageCacheVO c where c.id in (:ids)";
sql = "select c from ImageCacheVO c where c.id in (:ids) and c.installUrl like 'ceph://%'";

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

防御的代价是迷惑,筛选只做一次

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch from 755d3d7 to 819bbd0 Compare June 9, 2026 16:04
PROBLEM:
Ceph primary storage previously assumed one image cache pool for one primary storage. When a VM root volume is created on another pool, the root volume may be cloned from the image cache snapshot in the default pool. Some customer Ceph clusters do not support cross-pool RBD clone, so VM creation, reimage, and change image can fail when the selected volume pool differs from the image cache pool.

SOLUTION:
Add a Ceph image cache pool strategy. The default strategy keeps the old behavior and uses the default image cache pool. The volume-pool strategy selects the target root volume pool when that physical pool also has the ImageCache role. Otherwise it falls back to the default image cache pool. The existing-cache strategy reuses an existing cache first, preferring the default pool cache and then another valid cache record.

Image cache preparation now selects the target cache pool before capacity allocation. If the selected pool already has a valid cache snapshot, the existing cache is reused. If the selected cache record is stale, only that cache record and its refs are removed and the flow retries. If another pool has a valid cache for the same image, the snapshot is copied into the selected pool with RBD cp/deep cp. The normal create-snapshot and protect-snapshot flow then creates a new cache in the selected pool. If no usable cache exists, the image is downloaded through the existing backup storage flow. Same-Ceph backup storage is copied into the selected pool when needed to avoid later cross-pool clone.

The final root volume operation still clones from the prepared image cache snapshot. The data-copy step is only used to materialize the cache in the selected pool. Snapshot reuse images keep their original volumeSnapshotReuse cache record and do not fall back to ordinary backup-storage download. Reinit can still select existing cache records by image UUID when the source ImageVO has already been deleted.

TEST:
Verified on root@172.20.13.237 under /root/zstack-workspace/zstack. Ran the changed Ceph integration case through TestCaseStabilityTest: CephPrimaryStorageVolumePoolsCase. Result: BUILD SUCCESS. Tests run: 1, Failures: 0, Errors: 0.

The case covers default strategy compatibility, volume-pool strategy, existing-cache strategy, same-Ceph backup storage copy to selected pool, stale selected cache cleanup, copying cache from another pool, image cache cleanup with refs, reimage, and multi-ImageCache-pool setup in the common Ceph primary storage spec.

Resolves: ZSTAC-62608

Change-Id: Ie498c8af87c1a0248429cc81980b135725894778
@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/support-multi-imagecache-pool@@2 branch from 819bbd0 to 0b6aa3b Compare June 9, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants