ZSTAC-84112 Add candidate decision diagnostics#4178
Conversation
Add candidate decision records for VM host selection APIs. Propagate diagnostics through allocator dry-run and failure opaque. Tests: TestCandidateDecisionRecorder CandidateDecisionApiCase Resolves: ZSTAC-84112 Change-Id: I16e4005c172f50ac8949b56408f60b3aba813aa1
|
Warning
|
| 层 / 文件 | 摘要 |
|---|---|
候选决策数据模型和常量定义 header/src/main/java/org/zstack/header/candidate/* |
新增11个候选决策相关的数据模型类和常量接口:CandidateDecisionContext(请求上下文)、CandidateDecisionRecorder<T>(泛型决策记录器)、CandidateDecisionResult(决策结果)、CandidateDecisionSummary(汇总统计)、CandidateRecord<T>(候选记录)、CandidateRejectReason(拒绝原因)及常量定义(CandidateTypes、CandidateReasonCodes、CandidateCategories、CandidateReasonDetails等)。 |
分配链基础设施与决策记录集成 header/src/main/java/org/zstack/header/allocator/*, compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java |
HostAllocatorChain集成CandidateDecisionRecorder,在分配流程初始化、Flow运行前后、失败处理和完成时进行决策生命周期管理。HostAllocatorSpec和AllocateHostMsg新增候选决策上下文字段。AbstractHostAllocatorFlow提供pass()、reject()等决策方法的受保护助手。HostAllocatorTrigger接口新增决策相关方法签名。 |
各分配流程中的候选决策记录 compute/src/main/java/org/zstack/compute/allocator/*Flow.java |
五个分配Flow(AvoidHostAllocatorFlow、DesignatedHostAllocatorFlow、FilterFlow、HostCapacityAllocatorFlow、HostStateAndHypervisorAllocatorFlow)在isCandidateDecisionEnabled()开启时,改为逐主机遍历并显式调用pass()或reject()记录决策,而不是单纯过滤并返回列表。 |
VM候选查询API定义 header/src/main/java/org/zstack/header/vm/APIGetVm*Msg/Reply.java |
新增三组API:APIGetVmCreationCandidatesMsg/Reply、APIGetVmMigrationCandidatesMsg/Reply、APIGetVmStartingCandidatesMsg/Reply,分别用于查询VM创建、迁移和启动时的可用主机候选及决策汇总。旧API(APIGetCandidateZonesClustersHostsForCreatingVmMsg等)标记为@Deprecated。 |
VM实例候选查询逻辑实现 compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java, compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java |
VmInstanceBase新增APIGetVmStartingCandidatesMsg和APIGetVmMigrationCandidatesMsg的处理方法,利用dry-run的分配流程收集决策结果。VmInstanceManagerImpl新增handle(APIGetVmCreationCandidatesMsg)构造分配消息并返回候选决策结果。AbstractVmInstance为新API消息类型添加状态机白名单。 |
消息和规格对象扩展 header/src/main/java/org/zstack/header/vm/*Msg.java, header/src/main/java/org/zstack/header/allocator/* |
CreateVmInstanceMsg、StartVmInstanceMsg、MigrateVmMsg、GetVmMigrationTargetHostMsg、InstantiateNewCreatedVmInstanceMsg等消息类新增candidateDecisionContext字段,使上下文能沿分配链传递。VmInstanceSpec新增相同字段以支持规格对象的上下文承载。 |
VM分配流程中的候选决策上下文设置 compute/src/main/java/org/zstack/compute/vm/VmAllocateHost*Flow.java, compute/src/main/java/org/zstack/compute/vm/InstantiateVmFromNewCreatedStruct.java |
各分配Flow在构建DesignatedAllocateHostMsg时,从VmInstanceSpec读取candidateDecisionContext并传递给分配消息,支持下游分配链感知这是"查询"而非"真实分配"。 |
主机候选集合构建 compute/src/main/java/org/zstack/compute/allocator/HostCandidateUniverseBuilder.java |
新增HostCandidateUniverseBuilder用于基于规格条件(zone/cluster/host)和权限(管理员/非管理员)构建初始主机候选集合,支持多字段排序。 |
候选决策系统测试 test/src/test/groovy/org/zstack/test/integration/kvm/hostallocator/CandidateDecisionApiCase.groovy, test/src/test/java/org/zstack/test/candidate/TestCandidateDecisionRecorder.java |
新增集成测试CandidateDecisionApiCase验证迁移/创建/启动候选API在避免列表和主机禁用场景下的行为;单元测试TestCandidateDecisionRecorder验证决策记录器的构建、汇总统计和约束检查。 |
时序图
sequenceDiagram
participant Client as API客户端
participant API as VmInstanceManagerImpl
participant Chain as HostAllocatorChain
participant Flows as Allocator Flows
participant Recorder as CandidateDecisionRecorder
participant Result as CandidateDecisionResult
Client->>API: APIGetVmCreationCandidatesMsg
API->>Recorder: 新建CandidateDecisionRecorder
API->>Chain: 执行分配链(dry-run)
Chain->>Chain: 初始化候选全集
Chain->>Flows: 运行各分配Flow
Flows->>Recorder: reject(hostUuid, reason)
Flows->>Recorder: pass(hostUuid)
Chain->>Recorder: 标记最终候选/汇总
Chain->>Result: build()返回CandidateDecisionResult
Result->>API: 候选记录+汇总统计
API->>Client: APIGetVmCreationCandidatesReply
🎯 3 (Moderate) | ⏱️ ~25 minutes
🐰 Whiskers twitching with glee,
Candidate decisions bloom like trees,
Each host gets its tale,
Why passed or why failed,
Transparency's our guarantee! 🌲✨
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | 标题准确反映了拉取请求的核心功能:添加候选决策诊断功能,与changeset中新增的完整候选决策模型、记录器、汇总及拒绝原因码相符。 |
| Description check | ✅ Passed | 描述充分说明了拉取请求的主要变更内容:包括添加候选决策模型、在VM创建/启动/迁移路径中记录主机候选决策,以及在主机分配器Spring配置中注册主机候选宇宙构建器,且包含验证信息。 |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
sync/xinhao.huang/fix/ZSTAC-84112-candidate-decision-aios
Warning
There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
🔧 ast-grep (0.43.0)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)
936-944:⚠️ Potential issue | 🟠 Major | ⚡ Quick win启动候选诊断在无可用主机场景下被提前丢掉了。
这里对
NO_AVAILABLE_HOST仍然返回失败 reply,并且没有把 allocator 错误 opaque 里的candidateDecisionResult回填到GetVmStartingCandidateClustersHostsReply。上层handle(APIGetVmStartingCandidatesMsg)看到失败后会直接返回错误,因此“所有主机都被拒绝”的核心诊断场景拿不到candidates/summary/truncated。建议把这个分支改成可返回空主机列表但保留诊断结果的成功路径,或至少显式透传candidateDecisionResult。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 936 - 944, In VmInstanceBase.fail(ErrorCode) (the fail method in class VmInstanceBase) don't convert HostAllocatorError.NO_AVAILABLE_HOST into a plain failure; instead treat it as a successful reply with empty host/cluster lists but preserve and forward the allocator diagnostic (the candidateDecisionResult) from the allocator error opaque into the GetVmStartingCandidateClustersHostsReply so APIGetVmStartingCandidatesMsg handler can see candidates/summary/truncated; specifically, in the NO_AVAILABLE_HOST branch set reply.setHostInventories(new ArrayList<>()) and reply.setClusterInventories(new ArrayList<>()), set reply.setSuccess(true) (or avoid reply.setError), extract the candidateDecisionResult from errorCode.getOpaque(...) (or errorCode.getDetails/opaque map used by allocators) and populate the corresponding diagnostic field on GetVmStartingCandidateClustersHostsReply before bus.reply(msg, reply).
🧹 Nitpick comments (10)
compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java (1)
430-448: ⚡ Quick win新旧创建候选 API 的校验逻辑已重复,建议抽取公共校验方法。
两个
validate(...)方法内容基本一致,后续规则变更容易出现一边修一边漏的问题。建议提炼成一个私有通用方法并分别调用。Also applies to: 450-468
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java` around lines 430 - 448, Two validate(...) methods in VmInstanceApiInterceptor duplicate the same VM creation checks; extract the shared logic into a single private helper and call it from both validators. Create a private method (e.g., validateVmCreationCandidateFields) that accepts the message type (or a common supertype) and moves the instanceOfferingUuid vs cpu/memory checks and the ImageVO ISO -> rootDiskSize/rootDiskOfferingUuid checks (using dbf.findByUuid and throwing the same ApiMessageInterceptionException/OperationFailureException with the same messages). Replace the duplicated blocks in the existing validate(APIGetCandidateZonesClustersHostsForCreatingVmMsg msg) and the other validate(...) method (lines ~450-468) with a call to this new helper so behavior and exception texts remain identical.compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java (2)
895-916: 💤 Low value两个
newHostCandidateDecisionContext重载方法存在重复逻辑。两个方法几乎相同,仅
APICreateVmInstanceMsg版本额外设置了hostUuid。可考虑抽取公共逻辑:♻️ 建议的重构方案
+private void fillCommonHostCandidateContext(CandidateDecisionContext ctx, + String zoneUuid, String clusterUuid, String imageUuid, + String instanceOfferingUuid, String rootDiskOfferingUuid, + List<String> dataDiskOfferingUuids) { + ctx.getRequestScope().put("zoneUuid", zoneUuid); + ctx.getRequestScope().put("clusterUuid", clusterUuid); + ctx.getRequestScope().put("imageUuid", imageUuid); + ctx.getRequestScope().put("instanceOfferingUuid", instanceOfferingUuid); + ctx.getRequestScope().put("rootDiskOfferingUuid", rootDiskOfferingUuid); + ctx.getRequestScope().put("dataDiskOfferingUuids", dataDiskOfferingUuids); +} private CandidateDecisionContext newHostCandidateDecisionContext(APICreateVmInstanceMsg msg) { CandidateDecisionContext ctx = CandidateDecisionContext.fromApiMessage(msg, CandidateTypes.HOST); - ctx.getRequestScope().put("zoneUuid", msg.getZoneUuid()); - ctx.getRequestScope().put("clusterUuid", msg.getClusterUuid()); + fillCommonHostCandidateContext(ctx, msg.getZoneUuid(), msg.getClusterUuid(), + msg.getImageUuid(), msg.getInstanceOfferingUuid(), + msg.getRootDiskOfferingUuid(), msg.getDataDiskOfferingUuids()); ctx.getRequestScope().put("hostUuid", msg.getHostUuid()); - // ... rest of the puts return ctx; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java` around lines 895 - 916, Both newHostCandidateDecisionContext(APICreateVmInstanceMsg) and newHostCandidateDecisionContext(APIGetVmCreationCandidatesMsg) duplicate the same request-scope population logic; extract the common population into a single private helper (e.g., private CandidateDecisionContext populateHostCandidateContext(CandidateDecisionContext ctx, <common message base type> msg) or private CandidateDecisionContext newHostCandidateDecisionContextBase(ApiMessage msg)), call it from both newHostCandidateDecisionContext(APICreateVmInstanceMsg) and newHostCandidateDecisionContext(APIGetVmCreationCandidatesMsg), and keep the extra ctx.getRequestScope().put("hostUuid", msg.getHostUuid()) only in the APICreateVmInstanceMsg overload; update references to CandidateDecisionContext.fromApiMessage(...) to create the ctx before passing to the helper.
782-865: ⚡ Quick win建议抽取公共逻辑以减少代码重复。
handle(APIGetVmCreationCandidatesMsg)与handle(APIGetCandidateZonesClustersHostsForCreatingVmMsg)(lines 671-780) 存在约80行几乎相同的代码,包括 image/instanceOffering/diskOffering 处理、DesignatedAllocateHostMsg 构建等逻辑。建议将公共部分抽取为一个私有方法,例如:
private DesignatedAllocateHostMsg buildDesignatedAllocateHostMsgForCreation( String imageUuid, String zoneUuid, String clusterUuid, String instanceOfferingUuid, Integer cpuNum, Long memorySize, String rootDiskOfferingUuid, Long rootDiskSize, List<String> dataDiskOfferingUuids, List<String> l3NetworkUuids, List<String> systemTags) { // 公共逻辑 }两个 handler 方法可以复用此方法,仅在各自的特定逻辑(如 CandidateDecisionContext、accountUuid)和响应处理上有所区别。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java` around lines 782 - 865, The two handlers handle(APIGetVmCreationCandidatesMsg) and handle(APIGetCandidateZonesClustersHostsForCreatingVmMsg) duplicate ~80 lines (image/InstanceOffering/DiskOffering processing and DesignatedAllocateHostMsg construction); extract that shared logic into a private helper like buildDesignatedAllocateHostMsgForCreation(...) that accepts imageUuid, zoneUuid, clusterUuid, instanceOfferingUuid, cpuNum, memorySize, rootDiskOfferingUuid, rootDiskSize, dataDiskOfferingUuids, l3NetworkUuids, systemTags and returns a fully populated DesignatedAllocateHostMsg (including ImageInventory, cpu/memory, disk size aggregation, L3s, vm stub, system tags, allocator strategy, dryRun/listAllHosts flags and required backup storage selection); update both handle(...) methods to call this helper, then set their specific pieces (CandidateDecisionContext, accountUuid/session, VmOperation or any response-specific fields) before sending the message.header/src/main/java/org/zstack/header/candidate/CandidateReasonDetails.java (1)
31-33: ⚡ Quick win添加 null 参数校验。
方法
checkAllowed(String key, Object value)未对key参数进行空值校验。如果传入null,ALLOWED_KEYS.contains(key)可能返回false并触发断言失败,但错误信息会误导为"键不在白名单中"而非"键为空"。建议显式检查并提供清晰的错误提示。
🛡️ 建议的空值校验
public static void checkAllowed(String key, Object value) { + if (key == null) { + throw new IllegalArgumentException("candidate reason detail key cannot be null"); + } DebugUtils.Assert(ALLOWED_KEYS.contains(key), String.format("candidate reason detail key[%s] is not allowed", key)); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/candidate/CandidateReasonDetails.java` around lines 31 - 33, checkAllowed currently calls DebugUtils.Assert(ALLOWED_KEYS.contains(key), ...) without validating key; add an explicit null check at the start of CandidateReasonDetails.checkAllowed(String key, Object value) to assert key != null (using DebugUtils.Assert or equivalent) with a clear message like "candidate reason detail key is null", then keep the existing ALLOWED_KEYS.contains(key) assertion to validate membership; reference the method name checkAllowed, the ALLOWED_KEYS collection and DebugUtils.Assert when making the change.header/src/main/java/org/zstack/header/candidate/CandidateDecisionResult.java (1)
15-17: ⚡ Quick win防御性编程:避免 setter 接收 null 导致潜在空指针异常。
setCandidates方法允许传入null,但字段candidates在初始化时已指向空列表。如果外部代码将其设为null,后续调用getCandidates()的客户端可能假定返回值非空而触发 NPE。建议在 setter 中添加空值保护或在文档中明确说明该字段可能为
null。🛡️ 建议的防护性修复
public void setCandidates(List<CandidateRecord<?>> candidates) { - this.candidates = candidates; + this.candidates = candidates != null ? candidates : new ArrayList<>(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/candidate/CandidateDecisionResult.java` around lines 15 - 17, setCandidates currently accepts null which can overwrite the initialized non-null candidates list and cause NPEs for callers of getCandidates(); update setCandidates(List<CandidateRecord<?>> candidates) to defensively handle null by assigning an empty immutable or new ArrayList instead of null (e.g., when candidates == null set this.candidates = Collections.emptyList() or new ArrayList<>()) and keep references to the field name candidates and the accessor getCandidates() to locate the changes.header/src/main/java/org/zstack/header/candidate/CandidateRejectReason.java (1)
12-18: ⚡ Quick win在工厂方法中添加参数校验。
静态工厂方法
of(String code, String category, String message)未对参数进行空值校验。如果传入null,虽然不会立即失败,但会导致后续使用该对象时出现不明确的 NPE,增加调试难度。建议在方法入口添加参数校验,确保关键字段非空。
🛡️ 建议的参数校验
public static CandidateRejectReason of(String code, String category, String message) { + if (code == null || category == null || message == null) { + throw new IllegalArgumentException("code, category and message cannot be null"); + } CandidateRejectReason reason = new CandidateRejectReason(); reason.setCode(code); reason.setCategory(category); reason.setMessage(message); return reason; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/candidate/CandidateRejectReason.java` around lines 12 - 18, Add null checks in the static factory CandidateRejectReason.of(String code, String category, String message): validate that code, category and message are not null (or not blank if desired) at the method start and throw a clear unchecked exception (e.g., IllegalArgumentException or NullPointerException) with a descriptive message when any parameter is invalid; keep using reason.setCode(...), reason.setCategory(...), reason.setMessage(...) after validation so callers get immediate, explicit failures rather than later NPEs.header/src/main/java/org/zstack/header/candidate/CandidateDecisionSummary.java (1)
16-42: ⚡ Quick win添加空值校验以提高健壮性。
静态工厂方法
from(List<CandidateRecord<?>> records)未对输入参数records进行空值校验。如果传入null或列表中包含null元素,将触发NullPointerException。建议在方法入口添加防御性检查。
🛡️ 建议的空值校验
public static CandidateDecisionSummary from(List<CandidateRecord<?>> records) { + if (records == null) { + return new CandidateDecisionSummary(); + } CandidateDecisionSummary summary = new CandidateDecisionSummary(); for (CandidateRecord<?> record : records) { + if (record == null) { + continue; + } summary.total++;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/candidate/CandidateDecisionSummary.java` around lines 16 - 42, The static factory CandidateDecisionSummary.from(List<CandidateRecord<?>> records) needs defensive null checks: if records is null, return a new empty CandidateDecisionSummary (or throw IllegalArgumentException per project convention); additionally, inside the loop skip any null CandidateRecord<?> entries before using record.getDecision(), record.getCandidateType(), or record.getReason(); keep existing logic for non-null records and continue counting. Ensure references to CandidateDecisionSummary.from, CandidateRecord, and the existing byType/byCode/byCategory counting code are updated accordingly.header/src/main/java/org/zstack/header/candidate/TypeBreakdown.java (1)
41-50: ⚡ Quick win建议处理未知的 decision 值
count(String decision)方法当前仅处理SELECTED、REJECTED、SKIPPED三种已知值。若传入未知的 decision 值,total会被增加但不会更新任何分类计数,这可能导致统计不一致。建议添加 else 分支记录日志或抛出异常,以便及时发现非预期的 decision 值。
♻️ 建议添加未知值处理
public void count(String decision) { total++; if (CandidateTypes.SELECTED.equals(decision)) { selected++; } else if (CandidateTypes.REJECTED.equals(decision)) { rejected++; } else if (CandidateTypes.SKIPPED.equals(decision)) { skipped++; + } else { + logger.warn(String.format("unknown decision value: %s", decision)); } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@header/src/main/java/org/zstack/header/candidate/TypeBreakdown.java` around lines 41 - 50, The count(String decision) method in TypeBreakdown increments total but ignores unexpected decision values, causing inconsistent statistics; update TypeBreakdown.count to handle unknown decisions by adding an else branch that either logs a warning (using the class logger) including the unexpected decision and current state, or throws an IllegalArgumentException (choose consistent project behavior), so every increment of total is accompanied by a deterministic classification; reference the count method and CandidateTypes.SELECTED/REJECTED/SKIPPED to locate where to add the else branch.compute/src/main/java/org/zstack/compute/allocator/HostCapacityAllocatorFlow.java (1)
87-99: 💤 Low value建议使用 Stream 简化 Set 构建
当前使用显式循环构建
passed集合,可以使用 Stream API 简化。♻️ 建议的简化
- Set<String> passed = new HashSet<>(); - for (HostVO host : afterReservedCapacity) { - passed.add(host.getUuid()); - } + Set<String> passed = afterReservedCapacity.stream() + .map(HostVO::getUuid) + .collect(Collectors.toSet());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/allocator/HostCapacityAllocatorFlow.java` around lines 87 - 99, In HostCapacityAllocatorFlow replace the explicit loop that builds the Set<String> passed from afterReservedCapacity with a Stream-based collector to simplify and clarify intent: use afterReservedCapacity.stream().map(HostVO::getUuid).collect(Collectors.toSet()) (or similar) to produce passed, then keep the subsequent loop over ret and calls to reject/CandidateRejectReason/CandidateReasonCodes unchanged; ensure you import java.util.stream.Collectors if needed and preserve variable names (passed, afterReservedCapacity, ret) and the reject(...) call semantics.test/src/test/groovy/org/zstack/test/integration/kvm/hostallocator/CandidateDecisionApiCase.groovy (1)
292-301: 💤 Low value建议为断言添加失败消息以提高可读性。
Line 295 的
assert false在测试失败时不会提供上下文信息。建议添加描述性消息,例如assert false, "Expected ApiSenderException to be thrown",以便在测试失败时快速定位问题。♻️ 建议的改进
def expectCandidateDecisionFailure(Closure action) { try { action.call() - assert false + assert false, "Expected ApiSenderException to be thrown but no exception was thrown" } catch (ApiSenderException e) { def result = e.error.getFromOpaque("candidateDecisionResult") assert result != null return result } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/integration/kvm/hostallocator/CandidateDecisionApiCase.groovy` around lines 292 - 301, The test helper expectCandidateDecisionFailure currently uses a bare assert false after action.call which gives no context on failure; update the assertion to include a descriptive message (e.g., in expectCandidateDecisionFailure replace "assert false" with something like "assert false, 'Expected ApiSenderException to be thrown by action.call in expectCandidateDecisionFailure'") so that if action.call does not throw an ApiSenderException the test output clearly indicates the expectation; keep the rest of the catch block (ApiSenderException e and e.error.getFromOpaque("candidateDecisionResult")) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`:
- Around line 450-468: The validation in
VmInstanceApiInterceptor.validate(APIGetVmCreationCandidatesMsg) throws
OperationFailureException for the ISO/rootDiskSize check (using argerr with
ORG_ZSTACK_COMPUTE_VM_10112); change this to throw
ApiMessageInterceptionException instead so exception types are consistent in the
interceptor path—replace the OperationFailureException(...) with
ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10112, ...)) while
keeping the existing error message and parameters (image.getName(),
image.getUuid()).
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 2156-2160: In the VmInstanceBase migration dry-run branch where
re.isSuccess() is false and HostAllocatorError.NO_AVAILABLE_HOST is detected,
don't return a bare empty AllocateHostDryRunReply; extract the original
ErrorCode opaque (which contains candidateDecisionResult) from re.getError() and
attach it to the reply before calling completion.success(reply) so downstream
APIGetVmMigrationCandidatesMsg can read the candidateDecisionResult and avoid
candidateDecisionMissingError(); update the NO_AVAILABLE_HOST branch handling
around re.getError(), AllocateHostDryRunReply and completion.success to copy the
opaque/candidateDecisionResult through.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java`:
- Around line 918-922: The error code constant ORG_ZSTACK_COMPUTE_VM_10263 is
already used for a different message; update the candidateDecisionMissingError
method to use a new unique error code constant (e.g.,
ORG_ZSTACK_COMPUTE_VM_XXXXX) instead of ORG_ZSTACK_COMPUTE_VM_10263: add the new
constant to the same error-code registry/location, replace the constant
reference in candidateDecisionMissingError(APIMessage msg), and ensure the new
code is documented/registered where other ORG_ZSTACK_COMPUTE_VM_* codes are
declared so translations and error mappings remain consistent.
In
`@header/src/main/java/org/zstack/header/candidate/CandidateDecisionRecorder.java`:
- Around line 72-94: The assert logic is wrong because rejectIfNotRejected and
skip assume decision is null but only return for REJECTED; update both
rejectIfNotRejected(...) and skip(...) so they no-op whenever the
CandidateRecord is null or already has any non-null decision (i.e., change the
early-return condition to check record.getDecision() != null), and remove the
DebugUtils.Assert call; only call record.setDecision(...) and
record.setReason(...) when the existing decision is null to avoid assertion
failures after select(...).
In `@header/src/main/java/org/zstack/header/candidate/CandidateRejectReason.java`:
- Around line 20-27: The detail(String key, Object value) method currently
relies on CandidateReasonDetails.checkAllowed(key, value) which may be a
DebugUtils.Assert (no-op in production); change the call path so validation
always executes and fails fast: either modify
CandidateReasonDetails.checkAllowed to perform real runtime validation and throw
an unchecked exception (e.g. IllegalArgumentException) on invalid key/value, or
add an explicit boolean validator (e.g. CandidateReasonDetails.isAllowed(key,
value)) and in CandidateRejectReason.detail(...) check its result and throw an
IllegalArgumentException before mutating the details map; ensure the
implementation references CandidateRejectReason.detail,
CandidateReasonDetails.checkAllowed/isAllowed, and the details map so the
invalid entry is never written in production.
---
Outside diff comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 936-944: In VmInstanceBase.fail(ErrorCode) (the fail method in
class VmInstanceBase) don't convert HostAllocatorError.NO_AVAILABLE_HOST into a
plain failure; instead treat it as a successful reply with empty host/cluster
lists but preserve and forward the allocator diagnostic (the
candidateDecisionResult) from the allocator error opaque into the
GetVmStartingCandidateClustersHostsReply so APIGetVmStartingCandidatesMsg
handler can see candidates/summary/truncated; specifically, in the
NO_AVAILABLE_HOST branch set reply.setHostInventories(new ArrayList<>()) and
reply.setClusterInventories(new ArrayList<>()), set reply.setSuccess(true) (or
avoid reply.setError), extract the candidateDecisionResult from
errorCode.getOpaque(...) (or errorCode.getDetails/opaque map used by allocators)
and populate the corresponding diagnostic field on
GetVmStartingCandidateClustersHostsReply before bus.reply(msg, reply).
---
Nitpick comments:
In
`@compute/src/main/java/org/zstack/compute/allocator/HostCapacityAllocatorFlow.java`:
- Around line 87-99: In HostCapacityAllocatorFlow replace the explicit loop that
builds the Set<String> passed from afterReservedCapacity with a Stream-based
collector to simplify and clarify intent: use
afterReservedCapacity.stream().map(HostVO::getUuid).collect(Collectors.toSet())
(or similar) to produce passed, then keep the subsequent loop over ret and calls
to reject/CandidateRejectReason/CandidateReasonCodes unchanged; ensure you
import java.util.stream.Collectors if needed and preserve variable names
(passed, afterReservedCapacity, ret) and the reject(...) call semantics.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`:
- Around line 430-448: Two validate(...) methods in VmInstanceApiInterceptor
duplicate the same VM creation checks; extract the shared logic into a single
private helper and call it from both validators. Create a private method (e.g.,
validateVmCreationCandidateFields) that accepts the message type (or a common
supertype) and moves the instanceOfferingUuid vs cpu/memory checks and the
ImageVO ISO -> rootDiskSize/rootDiskOfferingUuid checks (using dbf.findByUuid
and throwing the same ApiMessageInterceptionException/OperationFailureException
with the same messages). Replace the duplicated blocks in the existing
validate(APIGetCandidateZonesClustersHostsForCreatingVmMsg msg) and the other
validate(...) method (lines ~450-468) with a call to this new helper so behavior
and exception texts remain identical.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java`:
- Around line 895-916: Both
newHostCandidateDecisionContext(APICreateVmInstanceMsg) and
newHostCandidateDecisionContext(APIGetVmCreationCandidatesMsg) duplicate the
same request-scope population logic; extract the common population into a single
private helper (e.g., private CandidateDecisionContext
populateHostCandidateContext(CandidateDecisionContext ctx, <common message base
type> msg) or private CandidateDecisionContext
newHostCandidateDecisionContextBase(ApiMessage msg)), call it from both
newHostCandidateDecisionContext(APICreateVmInstanceMsg) and
newHostCandidateDecisionContext(APIGetVmCreationCandidatesMsg), and keep the
extra ctx.getRequestScope().put("hostUuid", msg.getHostUuid()) only in the
APICreateVmInstanceMsg overload; update references to
CandidateDecisionContext.fromApiMessage(...) to create the ctx before passing to
the helper.
- Around line 782-865: The two handlers handle(APIGetVmCreationCandidatesMsg)
and handle(APIGetCandidateZonesClustersHostsForCreatingVmMsg) duplicate ~80
lines (image/InstanceOffering/DiskOffering processing and
DesignatedAllocateHostMsg construction); extract that shared logic into a
private helper like buildDesignatedAllocateHostMsgForCreation(...) that accepts
imageUuid, zoneUuid, clusterUuid, instanceOfferingUuid, cpuNum, memorySize,
rootDiskOfferingUuid, rootDiskSize, dataDiskOfferingUuids, l3NetworkUuids,
systemTags and returns a fully populated DesignatedAllocateHostMsg (including
ImageInventory, cpu/memory, disk size aggregation, L3s, vm stub, system tags,
allocator strategy, dryRun/listAllHosts flags and required backup storage
selection); update both handle(...) methods to call this helper, then set their
specific pieces (CandidateDecisionContext, accountUuid/session, VmOperation or
any response-specific fields) before sending the message.
In
`@header/src/main/java/org/zstack/header/candidate/CandidateDecisionResult.java`:
- Around line 15-17: setCandidates currently accepts null which can overwrite
the initialized non-null candidates list and cause NPEs for callers of
getCandidates(); update setCandidates(List<CandidateRecord<?>> candidates) to
defensively handle null by assigning an empty immutable or new ArrayList instead
of null (e.g., when candidates == null set this.candidates =
Collections.emptyList() or new ArrayList<>()) and keep references to the field
name candidates and the accessor getCandidates() to locate the changes.
In
`@header/src/main/java/org/zstack/header/candidate/CandidateDecisionSummary.java`:
- Around line 16-42: The static factory
CandidateDecisionSummary.from(List<CandidateRecord<?>> records) needs defensive
null checks: if records is null, return a new empty CandidateDecisionSummary (or
throw IllegalArgumentException per project convention); additionally, inside the
loop skip any null CandidateRecord<?> entries before using record.getDecision(),
record.getCandidateType(), or record.getReason(); keep existing logic for
non-null records and continue counting. Ensure references to
CandidateDecisionSummary.from, CandidateRecord, and the existing
byType/byCode/byCategory counting code are updated accordingly.
In
`@header/src/main/java/org/zstack/header/candidate/CandidateReasonDetails.java`:
- Around line 31-33: checkAllowed currently calls
DebugUtils.Assert(ALLOWED_KEYS.contains(key), ...) without validating key; add
an explicit null check at the start of
CandidateReasonDetails.checkAllowed(String key, Object value) to assert key !=
null (using DebugUtils.Assert or equivalent) with a clear message like
"candidate reason detail key is null", then keep the existing
ALLOWED_KEYS.contains(key) assertion to validate membership; reference the
method name checkAllowed, the ALLOWED_KEYS collection and DebugUtils.Assert when
making the change.
In `@header/src/main/java/org/zstack/header/candidate/CandidateRejectReason.java`:
- Around line 12-18: Add null checks in the static factory
CandidateRejectReason.of(String code, String category, String message): validate
that code, category and message are not null (or not blank if desired) at the
method start and throw a clear unchecked exception (e.g.,
IllegalArgumentException or NullPointerException) with a descriptive message
when any parameter is invalid; keep using reason.setCode(...),
reason.setCategory(...), reason.setMessage(...) after validation so callers get
immediate, explicit failures rather than later NPEs.
In `@header/src/main/java/org/zstack/header/candidate/TypeBreakdown.java`:
- Around line 41-50: The count(String decision) method in TypeBreakdown
increments total but ignores unexpected decision values, causing inconsistent
statistics; update TypeBreakdown.count to handle unknown decisions by adding an
else branch that either logs a warning (using the class logger) including the
unexpected decision and current state, or throws an IllegalArgumentException
(choose consistent project behavior), so every increment of total is accompanied
by a deterministic classification; reference the count method and
CandidateTypes.SELECTED/REJECTED/SKIPPED to locate where to add the else branch.
In
`@test/src/test/groovy/org/zstack/test/integration/kvm/hostallocator/CandidateDecisionApiCase.groovy`:
- Around line 292-301: The test helper expectCandidateDecisionFailure currently
uses a bare assert false after action.call which gives no context on failure;
update the assertion to include a descriptive message (e.g., in
expectCandidateDecisionFailure replace "assert false" with something like
"assert false, 'Expected ApiSenderException to be thrown by action.call in
expectCandidateDecisionFailure'") so that if action.call does not throw an
ApiSenderException the test output clearly indicates the expectation; keep the
rest of the catch block (ApiSenderException e and
e.error.getFromOpaque("candidateDecisionResult")) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0325a06b-f860-45cf-afc7-a49c0e61cedb
⛔ Files ignored due to path filters (2)
conf/serviceConfig/vmInstance.xmlis excluded by!**/*.xmlconf/springConfigXml/HostAllocatorManager.xmlis excluded by!**/*.xml
📒 Files selected for processing (55)
compute/src/main/java/org/zstack/compute/allocator/AvoidHostAllocatorFlow.javacompute/src/main/java/org/zstack/compute/allocator/DesignatedHostAllocatorFlow.javacompute/src/main/java/org/zstack/compute/allocator/FilterFlow.javacompute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.javacompute/src/main/java/org/zstack/compute/allocator/HostAllocatorManagerImpl.javacompute/src/main/java/org/zstack/compute/allocator/HostCandidateUniverseBuilder.javacompute/src/main/java/org/zstack/compute/allocator/HostCapacityAllocatorFlow.javacompute/src/main/java/org/zstack/compute/allocator/HostStateAndHypervisorAllocatorFlow.javacompute/src/main/java/org/zstack/compute/vm/AbstractVmInstance.javacompute/src/main/java/org/zstack/compute/vm/InstantiateVmFromNewCreatedStruct.javacompute/src/main/java/org/zstack/compute/vm/VmAllocateHostFlow.javacompute/src/main/java/org/zstack/compute/vm/VmAllocateHostForMigrateVmFlow.javacompute/src/main/java/org/zstack/compute/vm/VmAllocateHostForStoppedVmFlow.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceBase.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.javaheader/src/main/java/org/zstack/header/allocator/AbstractHostAllocatorFlow.javaheader/src/main/java/org/zstack/header/allocator/AllocateHostDryRunReply.javaheader/src/main/java/org/zstack/header/allocator/AllocateHostMsg.javaheader/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.javaheader/src/main/java/org/zstack/header/allocator/HostAllocatorTrigger.javaheader/src/main/java/org/zstack/header/candidate/CandidateCategories.javaheader/src/main/java/org/zstack/header/candidate/CandidateDecisionContext.javaheader/src/main/java/org/zstack/header/candidate/CandidateDecisionRecorder.javaheader/src/main/java/org/zstack/header/candidate/CandidateDecisionRequest.javaheader/src/main/java/org/zstack/header/candidate/CandidateDecisionResult.javaheader/src/main/java/org/zstack/header/candidate/CandidateDecisionSummary.javaheader/src/main/java/org/zstack/header/candidate/CandidateReasonCodes.javaheader/src/main/java/org/zstack/header/candidate/CandidateReasonDetails.javaheader/src/main/java/org/zstack/header/candidate/CandidateRecord.javaheader/src/main/java/org/zstack/header/candidate/CandidateRef.javaheader/src/main/java/org/zstack/header/candidate/CandidateRejectReason.javaheader/src/main/java/org/zstack/header/candidate/CandidateTypes.javaheader/src/main/java/org/zstack/header/candidate/TypeBreakdown.javaheader/src/main/java/org/zstack/header/vm/APIGetCandidateZonesClustersHostsForCreatingVmMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmCreationCandidatesMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmCreationCandidatesReply.javaheader/src/main/java/org/zstack/header/vm/APIGetVmMigrationCandidateHostsMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmMigrationCandidatesMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmMigrationCandidatesReply.javaheader/src/main/java/org/zstack/header/vm/APIGetVmStartingCandidateClustersHostsMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmStartingCandidatesMsg.javaheader/src/main/java/org/zstack/header/vm/APIGetVmStartingCandidatesReply.javaheader/src/main/java/org/zstack/header/vm/CreateVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/GetVmMigrationTargetHostMsg.javaheader/src/main/java/org/zstack/header/vm/GetVmMigrationTargetHostReply.javaheader/src/main/java/org/zstack/header/vm/GetVmStartingCandidateClustersHostsMsg.javaheader/src/main/java/org/zstack/header/vm/GetVmStartingCandidateClustersHostsReply.javaheader/src/main/java/org/zstack/header/vm/InstantiateNewCreatedVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/MigrateVmInnerMsg.javaheader/src/main/java/org/zstack/header/vm/MigrateVmMsg.javaheader/src/main/java/org/zstack/header/vm/StartVmInstanceMsg.javaheader/src/main/java/org/zstack/header/vm/VmInstanceSpec.javatest/src/test/groovy/org/zstack/test/integration/kvm/hostallocator/CandidateDecisionApiCase.groovytest/src/test/java/org/zstack/test/candidate/TestCandidateDecisionRecorder.java
| private void validate(APIGetVmCreationCandidatesMsg msg) { | ||
| final String instanceOfferingUuid = msg.getInstanceOfferingUuid(); | ||
|
|
||
| if (instanceOfferingUuid == null) { | ||
| if (msg.getCpuNum() == null || msg.getMemorySize() == null) { | ||
| throw new ApiMessageInterceptionException(operr(ORG_ZSTACK_COMPUTE_VM_10111, "Missing CPU/memory settings")); | ||
| } | ||
| } | ||
|
|
||
| ImageVO image = dbf.findByUuid(msg.getImageUuid(), ImageVO.class); | ||
| if (image != null && image.getMediaType() == ImageMediaType.ISO) { | ||
| if (msg.getRootDiskOfferingUuid() == null) { | ||
| if (msg.getRootDiskSize() == null || msg.getRootDiskSize() <= 0) { | ||
| throw new OperationFailureException(argerr(ORG_ZSTACK_COMPUTE_VM_10112, "the image[name:%s, uuid:%s] is an ISO, rootDiskSize must be set", | ||
| image.getName(), image.getUuid())); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
拦截器校验异常类型不一致,建议改为 ApiMessageInterceptionException。
Line 463 当前抛出 OperationFailureException,与拦截器参数校验路径的异常语义不一致,可能影响统一错误处理与前端返回行为。
Based on learnings,在 VmInstanceApiInterceptor 的参数校验流程中应统一抛出 ApiMessageInterceptionException。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`
around lines 450 - 468, The validation in
VmInstanceApiInterceptor.validate(APIGetVmCreationCandidatesMsg) throws
OperationFailureException for the ISO/rootDiskSize check (using argerr with
ORG_ZSTACK_COMPUTE_VM_10112); change this to throw
ApiMessageInterceptionException instead so exception types are consistent in the
interceptor path—replace the OperationFailureException(...) with
ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10112, ...)) while
keeping the existing error message and parameters (image.getName(),
image.getUuid()).
Source: Learnings
| if (!re.isSuccess()) { | ||
| if (HostAllocatorError.NO_AVAILABLE_HOST.toString().equals(re.getError().getCode())) { | ||
| completion.success(new ArrayList<HostInventory>()); | ||
| AllocateHostDryRunReply reply = new AllocateHostDryRunReply(); | ||
| reply.setHosts(new ArrayList<>()); | ||
| completion.success(reply); |
There was a problem hiding this comment.
迁移 dry-run 的空结果分支丢失了候选决策结果。
当 allocator 以 NO_AVAILABLE_HOST 失败时,这里新建了一个空的 AllocateHostDryRunReply,但没有把原始 ErrorCode opaque 中的 candidateDecisionResult 带过去。下游 handle(APIGetVmMigrationCandidatesMsg) 会因此命中 candidateDecisionMissingError(),导致迁移候选诊断 API 在“没有可迁移主机”时返回内部错误,而不是完整诊断结果。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
2156 - 2160, In the VmInstanceBase migration dry-run branch where re.isSuccess()
is false and HostAllocatorError.NO_AVAILABLE_HOST is detected, don't return a
bare empty AllocateHostDryRunReply; extract the original ErrorCode opaque (which
contains candidateDecisionResult) from re.getError() and attach it to the reply
before calling completion.success(reply) so downstream
APIGetVmMigrationCandidatesMsg can read the candidateDecisionResult and avoid
candidateDecisionMissingError(); update the NO_AVAILABLE_HOST branch handling
around re.getError(), AllocateHostDryRunReply and completion.success to copy the
opaque/candidateDecisionResult through.
| private ErrorCode candidateDecisionMissingError(APIMessage msg) { | ||
| return inerr(ORG_ZSTACK_COMPUTE_VM_10263, | ||
| "candidate decision result is missing for diagnostic API[%s], requestUuid[%s]", | ||
| msg.getClass().getSimpleName(), msg.getId()); | ||
| } |
There was a problem hiding this comment.
错误码冲突:ORG_ZSTACK_COMPUTE_VM_10263 已被其他错误使用。
此处使用 ORG_ZSTACK_COMPUTE_VM_10263 表示 "candidate decision result is missing",但同一错误码在 line 2824 已被用于完全不同的错误消息 "the resource is a ROOT volume, you cannot change its owner"。
错误码应该唯一对应一种错误类型,以便:
- 正确的国际化 (i18n) 翻译
- 错误日志分析和告警
- API 错误处理
请为此错误分配一个新的唯一错误码。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java`
around lines 918 - 922, The error code constant ORG_ZSTACK_COMPUTE_VM_10263 is
already used for a different message; update the candidateDecisionMissingError
method to use a new unique error code constant (e.g.,
ORG_ZSTACK_COMPUTE_VM_XXXXX) instead of ORG_ZSTACK_COMPUTE_VM_10263: add the new
constant to the same error-code registry/location, replace the constant
reference in candidateDecisionMissingError(APIMessage msg), and ensure the new
code is documented/registered where other ORG_ZSTACK_COMPUTE_VM_* codes are
declared so translations and error mappings remain consistent.
| public void rejectIfNotRejected(String candidateUuid, CandidateRejectReason reason) { | ||
| CandidateRecord<T> record = records.get(candidateUuid); | ||
| if (record == null || CandidateTypes.REJECTED.equals(record.getDecision())) { | ||
| return; | ||
| } | ||
|
|
||
| DebugUtils.Assert(record.getDecision() == null, | ||
| String.format("candidate[%s] decision is already %s", candidateUuid, record.getDecision())); | ||
| record.setDecision(CandidateTypes.REJECTED); | ||
| record.setReason(reason); | ||
| } | ||
|
|
||
| public void skip(String candidateUuid, CandidateRejectReason reason) { | ||
| CandidateRecord<T> record = records.get(candidateUuid); | ||
| if (record == null || CandidateTypes.REJECTED.equals(record.getDecision())) { | ||
| return; | ||
| } | ||
|
|
||
| DebugUtils.Assert(record.getDecision() == null, | ||
| String.format("candidate[%s] decision is already %s", candidateUuid, record.getDecision())); | ||
| record.setDecision(CandidateTypes.SKIPPED); | ||
| record.setReason(reason); | ||
| } |
There was a problem hiding this comment.
决策状态断言逻辑存在缺陷。
rejectIfNotRejected() 和 skip() 方法的断言逻辑不完整:
- 第 74-75 行和第 86-87 行:仅在决策状态为
REJECTED时提前返回 - 第 78-79 行和第 90-91 行:断言决策状态必须为
null
这导致以下场景会触发断言失败:
- 先调用
select(uuid)将决策设为SELECTED - 再调用
skip(uuid, reason),此时决策不是REJECTED,继续执行 - 断言
decision == null失败,因为决策是SELECTED
同样的问题也存在于 rejectIfNotRejected() 方法。
🔧 建议的修复方案
方案 1:提前返回任何非空决策
public void rejectIfNotRejected(String candidateUuid, CandidateRejectReason reason) {
CandidateRecord<T> record = records.get(candidateUuid);
- if (record == null || CandidateTypes.REJECTED.equals(record.getDecision())) {
+ if (record == null || record.getDecision() != null) {
return;
}
- DebugUtils.Assert(record.getDecision() == null,
- String.format("candidate[%s] decision is already %s", candidateUuid, record.getDecision()));
record.setDecision(CandidateTypes.REJECTED);
record.setReason(reason);
}
public void skip(String candidateUuid, CandidateRejectReason reason) {
CandidateRecord<T> record = records.get(candidateUuid);
- if (record == null || CandidateTypes.REJECTED.equals(record.getDecision())) {
+ if (record == null || record.getDecision() != null) {
return;
}
- DebugUtils.Assert(record.getDecision() == null,
- String.format("candidate[%s] decision is already %s", candidateUuid, record.getDecision()));
record.setDecision(CandidateTypes.SKIPPED);
record.setReason(reason);
}方案 2:仅在决策为 REJECTED 时返回,移除断言,并记录日志
public void rejectIfNotRejected(String candidateUuid, CandidateRejectReason reason) {
CandidateRecord<T> record = records.get(candidateUuid);
if (record == null || CandidateTypes.REJECTED.equals(record.getDecision())) {
return;
}
- DebugUtils.Assert(record.getDecision() == null,
- String.format("candidate[%s] decision is already %s", candidateUuid, record.getDecision()));
+ if (record.getDecision() != null) {
+ logger.warn(String.format("candidate[%s] decision is already %s, skipping reject",
+ candidateUuid, record.getDecision()));
+ return;
+ }
record.setDecision(CandidateTypes.REJECTED);
record.setReason(reason);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@header/src/main/java/org/zstack/header/candidate/CandidateDecisionRecorder.java`
around lines 72 - 94, The assert logic is wrong because rejectIfNotRejected and
skip assume decision is null but only return for REJECTED; update both
rejectIfNotRejected(...) and skip(...) so they no-op whenever the
CandidateRecord is null or already has any non-null decision (i.e., change the
early-return condition to check record.getDecision() != null), and remove the
DebugUtils.Assert call; only call record.setDecision(...) and
record.setReason(...) when the existing decision is null to avoid assertion
failures after select(...).
| public CandidateRejectReason detail(String key, Object value) { | ||
| CandidateReasonDetails.checkAllowed(key, value); | ||
| if (details == null) { | ||
| details = new LinkedHashMap<>(); | ||
| } | ||
| details.put(key, value); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
注意 detail() 方法依赖的校验可能在生产环境失效。
detail() 方法调用 CandidateReasonDetails.checkAllowed(key, value) 进行键名白名单校验。如该校验使用 DebugUtils.Assert 实现且在生产环境被禁用,则非法键名可能绕过检查被写入 details map。
请参考对 CandidateReasonDetails.checkAllowed 的审查意见,确保校验在所有环境中生效。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@header/src/main/java/org/zstack/header/candidate/CandidateRejectReason.java`
around lines 20 - 27, The detail(String key, Object value) method currently
relies on CandidateReasonDetails.checkAllowed(key, value) which may be a
DebugUtils.Assert (no-op in production); change the call path so validation
always executes and fails fast: either modify
CandidateReasonDetails.checkAllowed to perform real runtime validation and throw
an unchecked exception (e.g. IllegalArgumentException) on invalid key/value, or
add an explicit boolean validator (e.g. CandidateReasonDetails.isAllowed(key,
value)) and in CandidateRejectReason.detail(...) check its result and throw an
IllegalArgumentException before mutating the details map; ensure the
implementation references CandidateRejectReason.detail,
CandidateReasonDetails.checkAllowed/isAllowed, and the details map so the
invalid entry is never written in production.
Summary
Branches
Verification
mvn -Dmaven.repo.local=/root/repo/.m2/repository -DskipTests -P premium clean installPASSmvn -o -Dmaven.repo.local=/root/repo/.m2/repository test -Dtest=PremiumTestCaseStabilityTest -Dcases=org.zstack.test.integration.premium.pciDevice.TestGpuDeviceCandidateDecisionCase -Dtimes=1 -DmsgTimeoutMins=2 -DThreadFacade.maxThadNum=4 -Dskip.Jacoco=truePASSCompanion MR
sync from gitlab !10139