<fix>[kvm]: ZSTAC-81344 backport libvirt TLS to 4.8.38 (@@3)#4196
<fix>[kvm]: ZSTAC-81344 backport libvirt TLS to 4.8.38 (@@3)#4196ZStack-Robot wants to merge 8 commits into
Conversation
Add libvirt.tls.enabled GlobalConfig and useTls field in MigrateVmCmd to support TLS-encrypted libvirt connections for migration and V2V. Resolves: ZSTAC-81343 Change-Id: I391fa36c0dd63c25c5d85d102bc3579c8eb3d685 (cherry picked from commit 3feb9e9)
Resolves: ZSTAC-83696 Change-Id: I368cc5af8fb3d553bedc3be5d031015719e68ddc (cherry picked from commit c85fab3)
…TTP for TLS cert update Resolves: ZSTAC-83696 Change-Id: I4a5f404e51f12487b61bdf8f990bbc490cafeeee (cherry picked from commit 659c296)
… redeploy Resolves: ZSTAC-84446 Change-Id: I8af5e3887a5bad286b43dda00c874c9de999e1cb (cherry picked from commit 2697ef2)
Resolves: ZSTAC-84446 Change-Id: I9bed31c0cefddd6ed11f59cd13e36eb1c2abc029 (cherry picked from commit ad8a4c5)
br_conn_all_ns (169.254.64.1, mevoco.py CONNECT_ALL_NETNS_BR_OUTER_IP) is created on the host only when the first flat L3 network lands, AFTER the add-host ansible playbook has generated the TLS cert. The cert SAN therefore lacks this IP, and on every subsequent reconnect the check-tls-certs flow detects the gap and triggers an ansible redeploy plus libvirtd/kvmagent restart, breaking PID stability. This IP is host-internal (used only for VM <-> root-netns userdata / pushgateway / kvmagent HTTP) and has no business in a libvirtd TLS cert. Add EXCLUDED_INTERNAL_IPS as a single source of truth used by both the check flow (buildIpList) and the deploy flow (unionTlsCertIps) so the two flows can never disagree, regardless of when the bridge appears. host_plugin.fact() on the agent side is unchanged. Resolves: ZSTAC-84446 Change-Id: I10821d8c68190f2cbc8a0679d19ed053916d9184 (cherry picked from commit a544065)
Normalize the cherry-picked Groovy test to LF so git diff --check passes before creating the 4.8.38 backport MR. Resolves: ZSTAC-81344 Change-Id: I2ba5c110e21cf613c209a9ac49883b843fc26514
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Walkthrough该 PR 为 KVM 增加 libvirt TLS 迁移支持:新增全局配置、CA 初始化与持久化、主机 IP 探测与证书 SAN 校验、在重连/部署流程中计算 tlsCertIps、在迁移命令中传递 TLS 标志与源主机管理 IP,并附带集成与单元测试覆盖。 ChangesLibvirt TLS 迁移
Sequence Diagram(s)主机连接时 TLS 证书检查流程: sequenceDiagram
participant KVMHostConnectFlow
participant SshShell
participant KVMHostUtils
participant KVMHost
participant openssl
KVMHostConnectFlow->>KVMHostUtils: collectHostIps(sshShell, hostUuid, mgmtIp)
KVMHostUtils->>SshShell: execute("ip -4 -o addr show")
SshShell->>openssl: 返回 IP 地址输出
KVMHostUtils->>KVMHostUtils: buildIpList(mgmtIp, output, mnVip)
KVMHostUtils-->>KVMHostConnectFlow: 返回逗号分隔 IP 列表
KVMHostConnectFlow->>SshShell: execute("openssl x509 -text -noout -in servercert.pem")
SshShell-->>KVMHostConnectFlow: 返回证书文本(含 SAN)
KVMHostConnectFlow->>KVMHost: parseSanIps(sanOutput)
KVMHost-->>KVMHostConnectFlow: 返回 SAN 中的 IP 集合
KVMHostConnectFlow->>KVMHostConnectFlow: 校验 SAN IP 是否包含收集到的 IP
迁移命令 TLS 配置流程: sequenceDiagram
participant MigrateVmFlow
participant KVMGlobalConfig
participant KVMHostUtils
participant KVMAgentCommands
MigrateVmFlow->>KVMGlobalConfig: LIBVIRT_TLS_ENABLED.getValue()
KVMGlobalConfig-->>MigrateVmFlow: true/false
MigrateVmFlow->>KVMHostUtils: shouldForceTlsRedeploy(needDeploy, allowRestart, isNewAdded)
KVMHostUtils-->>MigrateVmFlow: true/false
alt TLS启用 且 重启允许
MigrateVmFlow->>KVMAgentCommands: cmd.setUseTls(true)
MigrateVmFlow->>KVMAgentCommands: cmd.setSrcHostManagementIp(srcIp)
else TLS禁用 或 重启禁用
MigrateVmFlow->>KVMAgentCommands: cmd.setUseTls(false)
end
MigrateVmFlow->>KVMAgentCommands: 发送迁移命令
预估代码审查工作量🎯 4 (Complex) | ⏱️ ~60 minutes 诗歌
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 3093-3094: The call that sets cmd.setUseTls currently only checks
RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE for self.getUuid(), which can enable TLS
even if the other peer isn't ready; update the logic so useTls is true only when
both endpoints are TLS-ready (e.g., check KVMGlobalConfig.LIBVIRT_TLS_ENABLED
plus the per-host RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE for both srcHostUuid
and dstHostUuid), or factor out a hostTlsReady(hostUuid) helper (used by
connectHook()) and call it for both srcHostUuid and dstHostUuid before setting
cmd.setUseTls.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java`:
- Around line 409-410: The current code in KVMHostFactory uses two separate
JsonLabel.createIfAbsent calls for LIBVIRT_TLS_CA_KEY and
LIBVIRT_TLS_PRIVATE_KEY (variables caInv and keyInv), which can race under
concurrent HA starts; fix by persisting the CA and private key atomically
instead of two independent writes: either serialize both values into a single
JSON object and call one createIfAbsent for a single label (store CA+key
together), or wrap the pair of createIfAbsent operations in a
global/cluster-wide lock or a DB transaction so they cannot interleave; update
references to JsonLabel.createIfAbsent, LIBVIRT_TLS_CA_KEY,
LIBVIRT_TLS_PRIVATE_KEY and the caInv/keyInv creation site in KVMHostFactory
accordingly.
- Around line 419-420: The catch in KVMHostFactory that currently swallows
exceptions when initializing the libvirt TLS CA (logger.warn("Failed to
initialize libvirt TLS CA", e)) must not allow the system to keep starting with
TLS enabled; update the handler in the method where libvirt TLS CA is
initialized (KVMHostFactory initialization block) to either (a) fail fast by
rethrowing a RuntimeException (or call System.exit/Service startup abort) when
kvm.libvirt.tls.enabled is true, or (b) explicitly disable the TLS feature by
setting the internal tlsEnabled flag to false and emit a clear ERROR-level log
and a visible alert/metric; ensure you reference the kvm.libvirt.tls.enabled
config, the tls initialization code path in KVMHostFactory, and any internal
field (e.g., tlsEnabled / libvirtTlsCaInitialized) so the change prevents silent
startup with TLS claimed enabled.
In `@test/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java`:
- Line 20: The test method names in KVMHostUtilsTest use underscores and violate
the project's lowerCamelCase Java naming convention (e.g.,
filtersZsSuffixIface_consistentWithHostFact); rename these methods to
lowerCamelCase (for example filtersZsSuffixIfaceConsistentWithHostFact) and
update every reference and any `@Test` annotations accordingly; ensure you rename
all other test methods in the same file that contain underscores to their
lowerCamelCase equivalents and run tests to verify no broken references remain.
- Around line 138-142: detected is a CSV string and using deployIps.contains(ip)
risks substring matches; split detected on ',' into elements, trim each element,
and assert exact membership (e.g. build a List<String> expected =
Arrays.stream(detected.split(",")).map(String::trim).collect(Collectors.toList())
and then use Assert.assertTrue(deployIps.containsAll(expected)) or iterate and
assert deployIps.contains(trimmedIp)); apply the same exact-match check for
"10.0.0.7" in KVMHostUtilsTest referencing the detected and deployIps variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3af3cd9a-1e3c-4186-9c0f-eb7b7ce7cd79
📒 Files selected for processing (9)
plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHost.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostDeployArguments.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.javatest/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovytest/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java
| cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class) | ||
| && rcf.getResourceConfigValue(KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE, self.getUuid(), Boolean.class)); |
There was a problem hiding this comment.
同时校验源端和目的端的 TLS 就绪条件。
Line 3094 这里只读取了 self.getUuid() 上的 RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE,但同一个方法里实际发起迁移的宿主机会在 srcHostUuid / dstHostUuid 之间切换,而且证书部署与 libvirtd 重启策略也是按 host 维度在 connectHook() 里生效的。这样一来,只要两端配置不一致,就可能在对端尚未具备 TLS 迁移条件时仍把 useTls 置为 true,直接把跨主机迁移打到 TLS 路径上并失败。这里至少要同时检查 srcHostUuid 和 dstHostUuid,或者复用一个统一的 “host TLS ready” 判定。
💡 建议修改
- cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class)
- && rcf.getResourceConfigValue(KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE, self.getUuid(), Boolean.class));
+ boolean srcTlsReady = rcf.getResourceConfigValue(
+ KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE,
+ srcHostUuid, Boolean.class);
+ boolean dstTlsReady = rcf.getResourceConfigValue(
+ KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE,
+ dstHostUuid, Boolean.class);
+ cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class)
+ && srcTlsReady
+ && dstTlsReady);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3093 -
3094, The call that sets cmd.setUseTls currently only checks
RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE for self.getUuid(), which can enable TLS
even if the other peer isn't ready; update the logic so useTls is true only when
both endpoints are TLS-ready (e.g., check KVMGlobalConfig.LIBVIRT_TLS_ENABLED
plus the per-host RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE for both srcHostUuid
and dstHostUuid), or factor out a hostTlsReady(hostUuid) helper (used by
connectHook()) and call it for both srcHostUuid and dstHostUuid before setting
cmd.setUseTls.
| JsonLabelInventory caInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca); | ||
| JsonLabelInventory keyInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key); |
There was a problem hiding this comment.
避免把 CA 证书和私钥拆成两个独立的 createIfAbsent() 写入。
Line 409-410 在 HA 并发启动下有竞态:两个 MN 可以先各自生成不同的本地 CA,然后分别抢先写入 libvirtTLSCA 和 libvirtTLSPrivateKey,最终把不匹配的证书/私钥组合持久化下来,后续主机证书签发会直接失效。这里需要把证书和私钥作为一个原子单元持久化,或者在初始化阶段加全局锁/事务,避免跨 key 交叉写入。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 409
- 410, The current code in KVMHostFactory uses two separate
JsonLabel.createIfAbsent calls for LIBVIRT_TLS_CA_KEY and
LIBVIRT_TLS_PRIVATE_KEY (variables caInv and keyInv), which can race under
concurrent HA starts; fix by persisting the CA and private key atomically
instead of two independent writes: either serialize both values into a single
JSON object and call one createIfAbsent for a single label (store CA+key
together), or wrap the pair of createIfAbsent operations in a
global/cluster-wide lock or a DB transaction so they cannot interleave; update
references to JsonLabel.createIfAbsent, LIBVIRT_TLS_CA_KEY,
LIBVIRT_TLS_PRIVATE_KEY and the caInv/keyInv creation site in KVMHostFactory
accordingly.
| } catch (Exception e) { | ||
| logger.warn("Failed to initialize libvirt TLS CA", e); |
There was a problem hiding this comment.
TLS CA 初始化失败时不要只记日志后继续启动。
当前 kvm.libvirt.tls.enabled 默认开启,这里只打 warning 会让管理节点在没有可用 CA 的情况下继续启动,直到后续 host reconnect / migrate 才以更隐蔽的方式失败。至少在 TLS 启用时应 fail fast,或者显式关闭该能力并给出可见告警。
💡 建议修改
} catch (Exception e) {
- logger.warn("Failed to initialize libvirt TLS CA", e);
+ if (KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class)) {
+ throw new CloudRuntimeException("Failed to initialize libvirt TLS CA", e);
+ }
+ logger.warn("Failed to initialize libvirt TLS CA while TLS is disabled", e);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (Exception e) { | |
| logger.warn("Failed to initialize libvirt TLS CA", e); | |
| } catch (Exception e) { | |
| if (KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class)) { | |
| throw new CloudRuntimeException("Failed to initialize libvirt TLS CA", e); | |
| } | |
| logger.warn("Failed to initialize libvirt TLS CA while TLS is disabled", e); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 419
- 420, The catch in KVMHostFactory that currently swallows exceptions when
initializing the libvirt TLS CA (logger.warn("Failed to initialize libvirt TLS
CA", e)) must not allow the system to keep starting with TLS enabled; update the
handler in the method where libvirt TLS CA is initialized (KVMHostFactory
initialization block) to either (a) fail fast by rethrowing a RuntimeException
(or call System.exit/Service startup abort) when kvm.libvirt.tls.enabled is
true, or (b) explicitly disable the TLS feature by setting the internal
tlsEnabled flag to false and emit a clear ERROR-level log and a visible
alert/metric; ensure you reference the kvm.libvirt.tls.enabled config, the tls
initialization code path in KVMHostFactory, and any internal field (e.g.,
tlsEnabled / libvirtTlsCaInitialized) so the change prevents silent startup with
TLS claimed enabled.
| } | ||
|
|
||
| @Test | ||
| public void filtersZsSuffixIface_consistentWithHostFact() { |
There was a problem hiding this comment.
测试方法命名不符合 lowerCamelCase 规范。
当前多个测试方法名使用下划线(_),与仓库 Java 命名规范不一致,建议统一改为 lowerCamelCase(例如 filtersZsSuffixIfaceConsistentWithHostFact)。
As per coding guidelines,**/*.java 中“方法名、参数名、成员变量和局部变量:使用 lowerCamelCase 风格”。
Also applies to: 72-72, 96-96, 103-103, 113-113, 120-120, 127-127, 134-134, 145-145, 156-156, 164-164, 169-169, 177-177, 182-182, 187-187
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java` at line 20, The
test method names in KVMHostUtilsTest use underscores and violate the project's
lowerCamelCase Java naming convention (e.g.,
filtersZsSuffixIface_consistentWithHostFact); rename these methods to
lowerCamelCase (for example filtersZsSuffixIfaceConsistentWithHostFact) and
update every reference and any `@Test` annotations accordingly; ensure you rename
all other test methods in the same file that contain underscores to their
lowerCamelCase equivalents and run tests to verify no broken references remain.
Source: Coding guidelines
| for (String ip : detected.split(",")) { | ||
| Assert.assertTrue(deployIps.contains(ip)); | ||
| } | ||
| Assert.assertTrue(deployIps.contains("10.0.0.7")); | ||
| } |
There was a problem hiding this comment.
IP 断言使用 contains 存在子串误判风险。
这里在 CSV 字符串上做 contains,可能把非完整 IP 的子串误判为命中,建议按逗号拆分后做精确匹配。
建议修改
- for (String ip : detected.split(",")) {
- Assert.assertTrue(deployIps.contains(ip));
- }
- Assert.assertTrue(deployIps.contains("10.0.0.7"));
+ java.util.Set<String> deploySet = new java.util.LinkedHashSet<>();
+ for (String ip : deployIps.split(",")) {
+ deploySet.add(ip.trim());
+ }
+ for (String ip : detected.split(",")) {
+ Assert.assertTrue(deploySet.contains(ip.trim()));
+ }
+ Assert.assertTrue(deploySet.contains("10.0.0.7"));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/src/test/java/org/zstack/test/kvm/KVMHostUtilsTest.java` around lines
138 - 142, detected is a CSV string and using deployIps.contains(ip) risks
substring matches; split detected on ',' into elements, trim each element, and
assert exact membership (e.g. build a List<String> expected =
Arrays.stream(detected.split(",")).map(String::trim).collect(Collectors.toList())
and then use Assert.assertTrue(deployIps.containsAll(expected)) or iterate and
assert deployIps.contains(trimmedIp)); apply the same exact-match check for
"10.0.0.7" in KVMHostUtilsTest referencing the detected and deployIps variables.
Use the 4.8.38 gray version for libvirt TLS migration fields added by this backport. Resolves: ZSTAC-81344 Change-Id: Id7de3f0d4a26edec9e661f97893f87dc523fff10
Summary
Backport the libvirt TLS control-plane work to 4.8.38 for ZSTAC-81344. This switches cross-host libvirt access away from plaintext TCP exposure and adds host certificate detection/update logic.
Changes
kvm.libvirt.tls.enabledand migration TLS command fields.br_conn_all_nsIP.Covered Issues
Testing
git diff --check upstream/4.8.38..cherry-pick/ZSTAC-81343-tls-4.8.38mvn -pl plugin/kvm -am -DskipTests compileResolves: ZSTAC-81344
sync from gitlab !10153