<fix>[storage]: support ceph multi image cache pool#4175
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
Walkthrough实现 Ceph 主存储镜像缓存池策略:新增消息字段与配置校验、实现按目标卷安装路径选池并支持跨池复制、重构缓存删除入口,并补充大量集成测试覆盖多种策略与场景。 ChangesCeph 镜像缓存池策略实现
Estimated code review effort:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 2175-2179: The DefaultImageCachePool branch currently picks
caches.get(0) which may not belong to the default pool or be a ceph:// record;
change the logic in CephPrimaryStorageBase where getImageCachePoolStrategy() ==
CephImageCachePoolStrategy.DefaultImageCachePool to first call
listImageCaches(imageUuid) and then select the ImageCacheVO whose pool name
equals getDefaultImageCachePoolName() and whose installUrl (or similar URL field
on ImageCacheVO) starts with "ceph://" (or otherwise identifies a Ceph record);
set selection.selectedPoolName = getDefaultImageCachePoolName() and
selection.selectedCache = that filtered cache (or null if none) instead of
caches.get(0) so we never pick a cache from the wrong pool or a non-ceph record.
- Around line 2204-2208: removeImageCacheRecord currently causes double capacity
release when concurrent tasks delete the same ImageCacheVO (same imageUuid)
across pools; make the delete idempotent or serialized: acquire a DB-level lock
or check delete affected rows before calling releaseAvailableCapacity.
Specifically, when handling ImageCacheVO in removeImageCacheRecord (and the
similar blocks around the other locations that call
osdHelper.releaseAvailableCapacity, SQL.New(...).delete(), and
dbf.remove(cache)), first lock or re-fetch the ImageCacheVO row for update (or
attempt the delete and inspect the affected row count) and only call
osdHelper.releaseAvailableCapacity if this invocation actually performed the
delete (affected rows > 0); alternatively serialize by imageUuid so only one
task removes/releases for that image. Ensure the SQL delete of
ImageCacheVolumeRefVO and dbf.remove(cache) are guarded by the same check/lock
to make the whole operation atomic and idempotent.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Line 1120: The filter closure on vm2.allVolumes incorrectly treats the closure
parameter as a String; change the predicate to compare the VolumeInventory's
uuid to vm2.rootVolumeUuid (e.g., use a parameter name like vol and test
vol.uuid != vm2.rootVolumeUuid) so only non-root volumes are passed to
deleteVolume; locate the usage of vm2.allVolumes, the filter closure,
vm2.rootVolumeUuid and the deleteVolume(it.uuid) call and update the predicate
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6027df32-a8d3-4612-83f7-fbaba4be9989
⛔ Files ignored due to path filters (1)
conf/globalConfig/ceph.xmlis excluded by!**/*.xml
📒 Files selected for processing (9)
header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javasdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.javatest/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovytestlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
22874b7 to
36c210f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 2238-2252: The method
shouldCopyCephBackupStorageImageToSelectedPool currently returns false for
CephImageCachePoolStrategy.DefaultImageCachePool causing images from
same-cluster Ceph backup storage to be left in the backup pool; change the logic
in shouldCopyCephBackupStorageImageToSelectedPool so that when
getImageCachePoolStrategy() == CephImageCachePoolStrategy.DefaultImageCachePool
you still compare the backupStoragePoolName (from getPoolNameFromCephInstallUrl)
with selectedImageCachePoolName and, if they differ and
hasImageCachePoolRole(selectedImageCachePoolName) is true and the backup storage
fsid matches getSelf().getFsid() (via dbf.findByUuid(CephBackupStorageVO)),
return true to force copying into the default image cache pool; preserve
existing early-return checks for blank names and role checks and keep references
to getImageCachePoolStrategy, getPoolNameFromCephInstallUrl,
hasImageCachePoolRole, and getSelf().getFsid() so doDownload's reserved capacity
for selectedImageCachePoolName is honored.
- Around line 5377-5394: The reimage flow reuses the raw installUrl from
selection.selectedCache but checkImageCacheBits() validates
ImageCacheUtil.getImageCachePath(cache); to fix, when selection.selectedCache is
used set installUrl to ImageCacheUtil.getImageCachePath(selection.selectedCache)
(same normalization used in createVolumeFromTemplate()) before calling
trigger.next() or proceeding to clone; update the same pattern in the similar
block around lines handling downloadImageToCache and the block at 5401-5421 so
cached paths and checks remain consistent (refer to selectImageCache,
checkImageCacheBits, ImageCacheUtil.getImageCachePath, installUrl,
downloadImageToCache, createVolumeFromTemplate).
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Around line 955-986: The IMAGE_CACHE_POOL_STRATEGY is changed before
createVmInstance but the try/finally only surrounds reimageVmInstance, so
failures in createVmInstance or stopVmInstance leak global state and leave the
created VM around; update testReimageVmAndAllocatePool to wrap the whole
sequence from ensureNewRootPoolIsImageCachePool through
stopVmInstance/reimageVmInstance in a single try/finally, restoring
CephGlobalConfig.IMAGE_CACHE_POOL_STRATEGY.updateValue(...) in the finally and
also ensure the created VM (new_root_pool_vm) is cleaned up there (call the
appropriate delete/destroy/stop helper for new_root_pool_vm) so both the config
and VM are always reverted even on earlier failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8a4a11e8-39c6-41da-bf54-fb427b9cea27
⛔ Files ignored due to path filters (1)
conf/globalConfig/ceph.xmlis excluded by!**/*.xml
📒 Files selected for processing (9)
header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javasdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.javatest/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovytestlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
🚧 Files skipped from review as they are similar to previous changes (6)
- plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
- sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
- header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
- testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
0d731e7 to
d4b49bd
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java (1)
5374-5416:⚠️ Potential issue | 🟠 Major | ⚡ Quick winreimage 这里要复用标准化后的 image cache 路径。
checkImageCacheBits()在 Line 5374 和 Line 2190 校验的是ImageCacheUtil.getImageCachePath(...),但 Line 5378/Line 5415 仍把原始installUrl传给后续 clone。这样一旦缓存记录里的快照路径与实际可复用路径不一致,reimage 会在预检通过后仍然从错误路径 clone;同文件createVolumeFromTemplate()也已经统一走ImageCacheUtil.getImageCachePath(...)了。🔧 建议修改
- installUrl = selection.selectedCache.getInstallUrl(); + installUrl = ImageCacheUtil.getImageCachePath(selection.selectedCache.toInventory()); trigger.next(); @@ - installUrl = ((DownloadVolumeTemplateToPrimaryStorageReply) reply).getImageCache().getInstallUrl(); + installUrl = ImageCacheUtil.getImageCachePath( + ((DownloadVolumeTemplateToPrimaryStorageReply) reply).getImageCache()); trigger.next();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java` around lines 5374 - 5416, The code assigns raw installUrl from selection.selectedCache and from DownloadVolumeTemplateToPrimaryStorageReply but checkImageCacheBits() validates using ImageCacheUtil.getImageCachePath(...), so normalize both values to the standardized cache path before using them: in the success branch of checkImageCacheBits() (where selection.selectedCache.getInstallUrl() is used) and in the DownloadVolumeTemplateToPrimaryStorageMsg callback (where reply.getImageCache().getInstallUrl() is used) replace the direct assignment with ImageCacheUtil.getImageCachePath(...) (passing the original install URL and the same parameters used by checkImageCacheBits), ensuring reimage reuses the standardized cache path similar to createVolumeFromTemplate().test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy (1)
988-988:⚠️ Potential issue | 🟡 Minor | 💤 Low value变量命名不符合 lowerCamelCase 规范。
VolumeInstallPath应改为volumeInstallPath。建议修复
- String VolumeInstallPath = Q.New(VolumeVO.class).select(VolumeVO_.installPath).eq(VolumeVO_.uuid, new_root_pool_vm.rootVolumeUuid).findValue() - assert VolumeInstallPath.contains(NEW_ROOT_POOL_NAME) + String volumeInstallPath = Q.New(VolumeVO.class).select(VolumeVO_.installPath).eq(VolumeVO_.uuid, new_root_pool_vm.rootVolumeUuid).findValue() + assert volumeInstallPath.contains(NEW_ROOT_POOL_NAME)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy` at line 988, Rename the local variable VolumeInstallPath to follow lowerCamelCase (volumeInstallPath) where it's declared in the query using Q.New(VolumeVO.class).select(...).findValue() and update all subsequent usages in the method/test (references to VolumeInstallPath) to volumeInstallPath to keep naming consistent with Java conventions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java`:
- Around line 5374-5416: The code assigns raw installUrl from
selection.selectedCache and from DownloadVolumeTemplateToPrimaryStorageReply but
checkImageCacheBits() validates using ImageCacheUtil.getImageCachePath(...), so
normalize both values to the standardized cache path before using them: in the
success branch of checkImageCacheBits() (where
selection.selectedCache.getInstallUrl() is used) and in the
DownloadVolumeTemplateToPrimaryStorageMsg callback (where
reply.getImageCache().getInstallUrl() is used) replace the direct assignment
with ImageCacheUtil.getImageCachePath(...) (passing the original install URL and
the same parameters used by checkImageCacheBits), ensuring reimage reuses the
standardized cache path similar to createVolumeFromTemplate().
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy`:
- Line 988: Rename the local variable VolumeInstallPath to follow lowerCamelCase
(volumeInstallPath) where it's declared in the query using
Q.New(VolumeVO.class).select(...).findValue() and update all subsequent usages
in the method/test (references to VolumeInstallPath) to volumeInstallPath to
keep naming consistent with Java conventions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6e19a5b4-7861-48ec-8678-ff6186fcb3bb
⛔ Files ignored due to path filters (1)
conf/globalConfig/ceph.xmlis excluded by!**/*.xml
📒 Files selected for processing (9)
header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javasdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.javatest/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovytestlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
✅ Files skipped from review due to trivial changes (1)
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
🚧 Files skipped from review as they are similar to previous changes (6)
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
- header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
- plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
- sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
- plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCacheCleaner.java
- testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
755d3d7 to
819bbd0
Compare
PROBLEM: Ceph primary storage previously assumed one image cache pool for one primary storage. When a VM root volume is created on another pool, the root volume may be cloned from the image cache snapshot in the default pool. Some customer Ceph clusters do not support cross-pool RBD clone, so VM creation, reimage, and change image can fail when the selected volume pool differs from the image cache pool. SOLUTION: Add a Ceph image cache pool strategy. The default strategy keeps the old behavior and uses the default image cache pool. The volume-pool strategy selects the target root volume pool when that physical pool also has the ImageCache role. Otherwise it falls back to the default image cache pool. The existing-cache strategy reuses an existing cache first, preferring the default pool cache and then another valid cache record. Image cache preparation now selects the target cache pool before capacity allocation. If the selected pool already has a valid cache snapshot, the existing cache is reused. If the selected cache record is stale, only that cache record and its refs are removed and the flow retries. If another pool has a valid cache for the same image, the snapshot is copied into the selected pool with RBD cp/deep cp. The normal create-snapshot and protect-snapshot flow then creates a new cache in the selected pool. If no usable cache exists, the image is downloaded through the existing backup storage flow. Same-Ceph backup storage is copied into the selected pool when needed to avoid later cross-pool clone. The final root volume operation still clones from the prepared image cache snapshot. The data-copy step is only used to materialize the cache in the selected pool. Snapshot reuse images keep their original volumeSnapshotReuse cache record and do not fall back to ordinary backup-storage download. Reinit can still select existing cache records by image UUID when the source ImageVO has already been deleted. TEST: Verified on root@172.20.13.237 under /root/zstack-workspace/zstack. Ran the changed Ceph integration case through TestCaseStabilityTest: CephPrimaryStorageVolumePoolsCase. Result: BUILD SUCCESS. Tests run: 1, Failures: 0, Errors: 0. The case covers default strategy compatibility, volume-pool strategy, existing-cache strategy, same-Ceph backup storage copy to selected pool, stale selected cache cleanup, copying cache from another pool, image cache cleanup with refs, reimage, and multi-ImageCache-pool setup in the common Ceph primary storage spec. Resolves: ZSTAC-62608 Change-Id: Ie498c8af87c1a0248429cc81980b135725894778
819bbd0 to
0b6aa3b
Compare
PROBLEM:
Ceph primary storage previously assumed one image cache pool for one primary storage. When a VM root volume is created on another pool, the root volume may be cloned from the image cache snapshot in the default pool. Some customer Ceph clusters do not support cross-pool RBD clone, so VM creation, reimage, and change image can fail when the selected volume pool differs from the image cache pool.
SOLUTION:
Add a Ceph image cache pool strategy. The default strategy keeps old behavior and uses the default image cache pool. The volume-pool strategy selects the target root volume pool when that physical pool also has the ImageCache role, otherwise it falls back to the default image cache pool. The existing-cache strategy reuses an existing cache first, preferring the default pool cache and then another valid cache record.
Image cache preparation selects the target cache pool before capacity allocation. If the selected pool already has a valid cache snapshot, it is reused. If the selected cache record is stale, only that cache record and its refs are removed and the flow retries. If another pool has a valid cache for the same image, the snapshot is copied into the selected pool with RBD cp/deep cp, then the normal create-snapshot and protect-snapshot flow creates a new cache in the selected pool. If no usable cache exists, the image is downloaded through the existing backup storage flow. Same-Ceph backup storage is copied into the selected pool when needed to avoid later cross-pool clone.
TEST:
Verified on root@172.20.13.237 under /root/zstack-workspace/zstack. Ran the changed Ceph integration case through TestCaseStabilityTest: CephPrimaryStorageVolumePoolsCase. Result: BUILD SUCCESS. Tests run: 1, Failures: 0, Errors: 0.
The case covers default strategy compatibility, volume-pool strategy, existing-cache strategy, stale selected cache cleanup, copying cache from another pool, image cache cleanup with refs, remove cached image, reimage, and multi-ImageCache-pool setup in the common Ceph primary storage spec.
sync from gitlab !10138