Skip to content

[VMware to KVM] Cleanup leftover migrated volumes in case of migration failures#13151

Open
nvazquez wants to merge 2 commits into
apache:mainfrom
shapeblue:423-vmw-kvm-cleanup-migrated-disks-failure
Open

[VMware to KVM] Cleanup leftover migrated volumes in case of migration failures#13151
nvazquez wants to merge 2 commits into
apache:mainfrom
shapeblue:423-vmw-kvm-cleanup-migrated-disks-failure

Conversation

@nvazquez

Copy link
Copy Markdown
Contributor

Description

This PR includes a cleanup mechanism for migrated volumes which are leftover in case the VMware to KVM migrations have failed unexpectedly.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@nvazquez

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.14894% with 143 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.08%. Comparing base (5893ba5) to head (fbe7ac6).
⚠️ Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
...urce/wrapper/LibvirtBaseConvertCommandWrapper.java 52.14% 63 Missing and 15 partials ⚠️
...rtCleanupConvertedInstanceDisksCommandWrapper.java 4.16% 23 Missing ⚠️
...gent/api/CleanupConvertedInstanceDisksCommand.java 0.00% 20 Missing ⚠️
.../apache/cloudstack/vm/UnmanagedVMsManagerImpl.java 20.00% 18 Missing and 2 partials ⚠️
.../LibvirtImportConvertedInstanceCommandWrapper.java 33.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13151      +/-   ##
============================================
- Coverage     18.09%   18.08%   -0.01%     
+ Complexity    16723    16721       -2     
============================================
  Files          6037     6040       +3     
  Lines        542580   542643      +63     
  Branches      66427    66432       +5     
============================================
- Hits          98155    98137      -18     
- Misses       433399   433481      +82     
+ Partials      11026    11025       -1     
Flag Coverage Δ
uitests 3.51% <ø> (ø)
unittests 19.24% <39.14%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17829

@nvazquez

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@nvazquez a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16076)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 4252 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr13151-t16076-kvm-ol8.zip
Smoke tests completed. 9 look OK, 142 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
runTest Error 0.00 test_2fa.py
runTest Error 0.00 test_account_access.py
runTest Error 0.00 test_accounts.py
runTest Error 0.00 test_affinity_groups_projects.py
runTest Error 0.00 test_affinity_groups.py
runTest Error 0.00 test_annotations.py
runTest Error 0.00 test_async_job.py
runTest Error 0.00 test_attach_multiple_volumes.py
runTest Error 0.00 test_backup_recovery_dummy.py
runTest Error 0.00 test_backup_recovery_nas.py
runTest Error 0.00 test_backup_recovery_veeam.py
runTest Error 0.00 test_certauthority_root.py
runTest Error 0.00 test_cluster_drs.py
runTest Error 0.00 test_console_endpoint.py
runTest Error 0.00 test_create_list_domain_account_project.py
runTest Error 0.00 test_create_network.py
runTest Error 0.00 test_deploy_vgpu_enabled_vm.py
runTest Error 0.00 test_deploy_virtio_scsi_vm.py
runTest Error 0.00 test_deploy_vm_extra_config_data.py
runTest Error 0.00 test_deploy_vm_iso.py
runTest Error 0.00 test_deploy_vm_iso_uefi.py
runTest Error 0.00 test_deploy_vm_root_resize.py
runTest Error 0.00 test_deploy_vms_in_parallel.py
runTest Error 0.00 test_deploy_vms_with_varied_deploymentplanners.py
runTest Error 0.00 test_deploy_vm_with_userdata.py
runTest Error 0.00 test_diagnostics.py
runTest Error 0.00 test_direct_download.py
runTest Error 0.00 test_disk_offerings.py
runTest Error 0.00 test_disk_provisioning_types.py
runTest Error 0.00 test_domain_disk_offerings.py
runTest Error 0.00 test_domain_network_offerings.py
runTest Error 0.00 test_domain_service_offerings.py
runTest Error 0.00 test_domain_vpc_offerings.py
runTest Error 0.00 test_enable_account_settings_for_domain.py
runTest Error 0.00 test_events_resource.py
runTest Error 0.00 test_extension_custom_action_lifecycle.py
runTest Error 0.00 test_extension_custom.py
runTest Error 0.00 test_extension_lifecycle.py
runTest Error 0.00 test_gateway_on_shared_networks.py
runTest Error 0.00 test_global_acls.py
runTest Error 0.00 test_global_settings.py
runTest Error 0.00 test_guest_os.py
runTest Error 0.00 test_guest_vlan_range.py
runTest Error 0.00 test_host_control_state.py
runTest Error 0.00 test_hostha_simulator.py
runTest Error 0.00 test_host_ping.py
runTest Error 0.00 test_image_store_object_migration.py
runTest Error 0.00 test_import_unmanage_volumes.py
runTest Error 0.00 test_internal_lb.py
runTest Error 0.00 test_ipv4_routing.py
runTest Error 0.00 test_ipv6_infra.py
runTest Error 0.00 test_iso.py
runTest Error 0.00 test_kubernetes_clusters.py
runTest Error 0.00 test_kubernetes_supported_versions.py
runTest Error 0.00 test_list_accounts.py
runTest Error 0.00 test_list_disk_offerings.py
runTest Error 0.00 test_list_domains.py
runTest Error 0.00 test_list_hosts.py
runTest Error 0.00 test_login.py
runTest Error 0.00 test_list_ids_parameter.py
runTest Error 0.00 test_list_service_offerings.py
runTest Error 0.00 test_list_storage_pools.py
runTest Error 0.00 test_list_volumes.py
runTest Error 0.00 test_loadbalance.py
runTest Error 0.00 test_metrics_api.py
runTest Error 0.00 test_migration.py
runTest Error 0.00 test_ms_maintenance_and_safe_shutdown.py
runTest Error 0.00 test_multipleips_per_nic.py
runTest Error 0.00 test_nested_virtualization.py
runTest Error 0.00 test_network_acl.py
runTest Error 0.00 test_network_ipv6.py
runTest Error 0.00 test_network_permissions.py
runTest Error 0.00 test_network.py
runTest Error 0.00 test_public_ip_range.py
runTest Error 0.00 test_nic_adapter_type.py
runTest Error 0.00 test_nic.py
runTest Error 0.00 test_non_contigiousvlan.py
runTest Error 0.00 test_nonstrict_affinity_group.py
runTest Error 0.00 test_outofbandmanagement_nestedplugin.py
runTest Error 0.00 test_outofbandmanagement.py
runTest Error 0.00 test_over_provisioning.py
runTest Error 0.00 test_password_server.py
runTest Error 0.00 test_persistent_network.py
runTest Error 0.00 test_portable_publicip.py
runTest Error 0.00 test_portforwardingrules.py
runTest Error 0.00 test_primary_storage.py
runTest Error 0.00 test_primary_storage_scope.py
runTest Error 0.00 test_privategw_acl_ovs_gre.py
runTest Error 0.00 test_privategw_acl.py
runTest Error 0.00 test_projects.py
runTest Error 0.00 test_purge_expunged_vms.py
runTest Error 0.00 test_pvlan.py
runTest Error 0.00 test_quarantined_ips.py
runTest Error 0.00 test_regions.py
runTest Error 0.00 test_register_userdata.py
runTest Error 0.00 test_reset_configuration_settings.py
runTest Error 0.00 test_reset_vm_on_reboot.py
runTest Error 0.00 test_resource_accounting.py
runTest Error 0.00 test_resource_detail.py
runTest Error 0.00 test_resource_names.py
runTest Error 0.00 test_restore_vm.py
runTest Error 0.00 test_router_dhcphosts.py
runTest Error 0.00 test_router_dns.py
runTest Error 0.00 test_router_dnsservice.py
runTest Error 0.00 test_routers_iptables_default_policy.py
runTest Error 0.00 test_routers_network_ops.py
runTest Error 0.00 test_routers.py
runTest Error 0.00 test_scale_vm.py
runTest Error 0.00 test_secondary_storage.py
runTest Error 0.00 test_service_offerings.py
runTest Error 0.00 test_set_sourcenat.py
runTest Error 0.00 test_sharedfs_lifecycle.py
runTest Error 0.00 test_snapshots.py
runTest Error 0.00 test_ssl_offloading.py
runTest Error 0.00 test_ssvm.py
runTest Error 0.00 test_storage_policy.py
runTest Error 0.00 test_systemvm_userdata.py
runTest Error 0.00 test_templates.py
runTest Error 0.00 test_update_security_group.py
runTest Error 0.00 test_usage_events.py
runTest Error 0.00 test_usage.py
runTest Error 0.00 test_vm_autoscaling.py
runTest Error 0.00 test_vm_deployment_planner.py
runTest Error 0.00 test_vm_life_cycle.py
runTest Error 0.00 test_vm_lifecycle_unmanage_import.py
runTest Error 0.00 test_vm_lifecycle_unmanage_kvm_import.py
runTest Error 0.00 test_vm_lifecycle_with_snapshot_or_volume.py
runTest Error 0.00 test_vm_schedule.py
runTest Error 0.00 test_vm_snapshot_kvm.py
runTest Error 0.00 test_vm_snapshots.py
runTest Error 0.00 test_vm_strict_host_tags.py
runTest Error 0.00 test_vnf_templates.py
runTest Error 0.00 test_volumes.py
runTest Error 0.00 test_vpc_conserve_mode.py
runTest Error 0.00 test_vpc_ipv6.py
runTest Error 0.00 test_vpc_redundant.py
runTest Error 0.00 test_vpc_router_nics.py
runTest Error 0.00 test_vpc_vpn.py
runTest Error 0.00 test_webhook_delivery.py
runTest Error 0.00 test_webhook_lifecycle.py
runTest Error 0.00 test_host_maintenance.py
runTest Error 0.00 test_hostha_kvm.py

@sureshanaparti sureshanaparti requested a review from Copilot May 13, 2026 07:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a best-effort cleanup path for temporary converted volumes when VMware→KVM import fails, and refactors KVM conversion wrappers to share common helper logic.

Changes:

  • Trigger cleanup of temporary converted disks on import failures/timeouts.
  • Introduce CleanupConvertedInstanceDisksCommand and a corresponding KVM resource wrapper to delete leftover conversion artifacts.
  • Refactor LibvirtImportConvertedInstanceCommandWrapper to reuse a new shared LibvirtBaseConvertCommandWrapper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
server/src/main/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImpl.java Sends a new cleanup command to remove temporary converted disks when import fails.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtImportConvertedInstanceCommandWrapper.java Refactors wrapper to inherit shared convert/import helpers and updates cleanup call signature.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtCleanupConvertedInstanceDisksCommandWrapper.java Adds KVM-side handler that locates and deletes temporary conversion disks (and XML when present).
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtBaseConvertCommandWrapper.java Introduces shared helper methods previously embedded in the import wrapper.
core/src/main/java/com/cloud/agent/api/CleanupConvertedInstanceDisksCommand.java Adds agent command to request cleanup of converted disks by store + prefix.
core/src/main/java/com/cloud/agent/api/CleanupConvertedInstanceDisksAnswer.java Adds a new Answer type for the cleanup command (currently empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/main/java/com/cloud/agent/api/CleanupConvertedInstanceDisksAnswer.java Outdated
@nvazquez

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@harikrishna-patnala harikrishna-patnala left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM. @nvazquez please check if we can add some unit tests here.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17841

@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-16113)

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16155)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 58065 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr13151-t16155-kvm-ol8.zip
Smoke tests completed. 145 look OK, 6 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_DeployVmAntiAffinityGroup_in_project Error 72.96 test_affinity_groups_projects.py
test_DeployVmAntiAffinityGroup Error 6.88 test_affinity_groups.py
test_03_deploy_and_scale_kubernetes_cluster Failure 29.17 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 0.09 test_kubernetes_clusters.py
test_12_test_deploy_cluster_different_offerings_per_node_type Failure 73.55 test_kubernetes_clusters.py
test_01_non_strict_host_anti_affinity Failure 83.94 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 30.58 test_nonstrict_affinity_group.py
ContextSuite context=TestMigrateVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
test_hostha_enable_ha_when_host_in_maintenance Error 303.06 test_hostha_kvm.py

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-16175)

@RosiKyu

RosiKyu commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@RosiKyu a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16334)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 56192 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr13151-t16334-kvm-ol8.zip
Smoke tests completed. 150 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_isolate_network_FW_PF_default_routes_egress_true Failure 119.69 test_routers_network_ops.py

@RosiKyu RosiKyu self-assigned this Jun 17, 2026

import com.cloud.agent.api.to.DataStoreTO;

public class CleanupConvertedInstanceDisksCommand extends Command {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a javadoc with the intent of this command? I suppose it cleans disks for a non existing vm with name vmVolumesPrefix, but would be nice to make that explicit

Comment on lines +2245 to +2260
if (cleanupConvertedDisks) {
logger.debug("Cleaning up the converted disks for the VM {} through " +
"the conversion host {}", sourceVM, convertHost.getName());
CleanupConvertedInstanceDisksCommand cleanupCommand =
new CleanupConvertedInstanceDisksCommand(temporaryConvertLocation, convertedDisksPrefix);
try {
Answer cleanupAnswer = agentManager.send(convertHost.getId(), cleanupCommand);
if (!cleanupAnswer.getResult()) {
logger.warn("Failed to cleanup the converted disks for the VM {} through " +
"the conversion host {}: {}", sourceVM, convertHost.getName(), cleanupAnswer.getDetails());
}
} catch (AgentUnavailableException | OperationTimedoutException e) {
logger.error("Error cleaning up converted disks for VM {} through the conversion host {}",
sourceVM, convertHost.getName(), e);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants