SPLAT-2790: Added new VAP for vSphere infra validation against current cpms and machines#1510
SPLAT-2790: Added new VAP for vSphere infra validation against current cpms and machines#1510vr4manta wants to merge 3 commits into
Conversation
|
@vr4manta: This pull request references SPLAT-2790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (6)
WalkthroughAdds three ChangesvSphere Failure Domain ValidatingAdmissionPolicy Support
Sequence Diagram(s)sequenceDiagram
participant Operator as machine-api-operator
participant k8s as Kubernetes API
participant MachineVAP as Machine VAP
participant CPMSVAP as CPMS VAP
participant MachineSetVAP as MachineSet VAP
rect rgba(100, 149, 237, 0.5)
Note over Operator,k8s: syncVSphereFailureDomainVAPs (vSphere only, feature-gated)
Operator->>k8s: apply NewVSphereFailureDomainMachineVAP
k8s-->>Operator: ok
Operator->>k8s: apply NewVSphereFailureDomainMachineVAPBinding
k8s-->>Operator: ok
Operator->>k8s: apply NewVSphereFailureDomainCPMSVAP
k8s-->>Operator: ok
Operator->>k8s: apply NewVSphereFailureDomainCPMSVAPBinding
k8s-->>Operator: ok
Operator->>k8s: apply NewVSphereFailureDomainMachineSetVAP
k8s-->>Operator: ok
Operator->>k8s: apply NewVSphereFailureDomainMachineSetVAPBinding
k8s-->>Operator: ok
end
rect rgba(255, 140, 0, 0.5)
Note over k8s,MachineSetVAP: Infrastructure UPDATE admission
k8s->>MachineVAP: evaluate per Machine param (region/zone labels)
MachineVAP-->>k8s: Deny if referenced FD removed
k8s->>CPMSVAP: evaluate per ControlPlaneMachineSet param (template FD names)
CPMSVAP-->>k8s: Deny if referenced FD removed
k8s->>MachineSetVAP: evaluate per MachineSet param (template region/zone labels)
MachineSetVAP-->>k8s: Deny if referenced FD removed
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
/test all |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@install/0000_30_machine-api-operator_09_rbac.yaml`:
- Around line 484-495: The ClusterRole rule at lines 487–495 grants unnecessary
cluster-wide read access to machines and controlplanemachinesets when the VAP
policies scope parameter resolution to the openshift-machine-api namespace only.
Remove this cluster-wide rule from the ClusterRole, and instead add the get verb
to the existing namespaced Role rule at line 363 that already grants list and
watch on all resources in the openshift-machine-api namespace. This narrows
permissions to only the namespace where the operator actually needs access while
maintaining the required authorization for VAP binding creation.
In `@pkg/operator/sync.go`:
- Around line 285-304: The ApplyValidatingAdmissionPolicyV1 and
ApplyValidatingAdmissionPolicyBindingV1 calls are using context.TODO() which
does not provide cancellation or timeout bounds. Replace context.TODO() with a
bounded context that includes proper cancellation and timeout behavior suitable
for the reconcile path. Create or use an existing bounded context with
appropriate timeout values (such as deriving it from the reconciliation context
or creating one with a reasonable timeout) and pass it to all four resourceapply
function calls instead of context.TODO().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fa16aa88-d84f-4db6-80ab-a0191fea2bf4
📒 Files selected for processing (5)
install/0000_30_machine-api-operator_09_rbac.yamlpkg/operator/sync.gopkg/operator/sync_test.gopkg/webhooks/vap.gopkg/webhooks/vap_test.go
|
/test ? |
|
/test e2e-vsphere-ovn-serial e2e-vsphere-ovn-techpreview-serial |
|
/retest |
|
@CodeRabbit help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
Status, support, documentation and community
|
|
@coderabbitai summary |
✅ Action performedSummary regeneration triggered. |
866177d to
cf0bb7c
Compare
|
/test ? |
|
/test e2e-vsphere-ovn-techpreview e2e-vsphere-ovn-techpreview-serial e2e-vsphere-ovn e2e-vsphere-ovn-serial |
|
/assign @JoelSpeed |
|
/test e2e-vsphere-ovn-techpreview e2e-vsphere-ovn-techpreview-serial e2e-vsphere-ovn e2e-vsphere-ovn-serial |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@vr4manta: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
JoelSpeed
left a comment
There was a problem hiding this comment.
I've left a couple of examples to make the CEL more concise, they apply to pretty much all of the CEL so PTAL at how you can use the optional orValue pattern across the rules you've got
| # The machine-api-operator service account must be able to read Machines, | ||
| # ControlPlaneMachineSets, and MachineSets so the kube-apiserver can resolve | ||
| # them as VAP params when validating Infrastructure/cluster updates. |
There was a problem hiding this comment.
Really? Is that documented somewhere in the upstream docs you can reference?
I can see why it would make sense, but I've never thought about it like that before
| featureGates, err := optr.featureGateAccessor.CurrentFeatureGates() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get current feature gates: %w", err) | ||
| } |
There was a problem hiding this comment.
Do this earlier. The gates extracted here don't change, or, if they do, the whole bin exits. These can be extracted and held as a global in the optr struct earlier in the construction
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainMachineVAP(), optr.cache); err != nil { | ||
| return err | ||
| } | ||
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyBindingV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainMachineVAPBinding(), optr.cache); err != nil { | ||
| return err | ||
| } | ||
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainCPMSVAP(), optr.cache); err != nil { | ||
| return err | ||
| } | ||
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyBindingV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainCPMSVAPBinding(), optr.cache); err != nil { | ||
| return err | ||
| } | ||
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainMachineSetVAP(), optr.cache); err != nil { | ||
| return err | ||
| } | ||
| if _, _, err := resourceapply.ApplyValidatingAdmissionPolicyBindingV1(context.TODO(), | ||
| optr.kubeClient.AdmissionregistrationV1(), recorder, | ||
| mapiwebhooks.NewVSphereFailureDomainMachineSetVAPBinding(), optr.cache); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Please wrap each err to give us a unique error message per return
| // VAPMachineFailureDomainName is the name of the ValidatingAdmissionPolicy that guards | ||
| // against removing a vSphere failure domain that is still referenced by a Machine. | ||
| VAPMachineFailureDomainName = "vsphere-failure-domain-in-use-by-machine" | ||
|
|
||
| // VAPCPMSFailureDomainName is the name of the ValidatingAdmissionPolicy that guards | ||
| // against removing a vSphere failure domain that is still referenced by a ControlPlaneMachineSet. | ||
| VAPCPMSFailureDomainName = "vsphere-failure-domain-in-use-by-cpms" | ||
|
|
||
| // VAPMachineSetFailureDomainName is the name of the ValidatingAdmissionPolicy that guards | ||
| // against removing a vSphere failure domain that is still referenced by a MachineSet (including | ||
| // MachineSets with zero replicas that would have no running Machines to catch the check). | ||
| VAPMachineSetFailureDomainName = "vsphere-failure-domain-in-use-by-machineset" | ||
|
|
||
| // vapMachineBindingName is the name of the binding for the Machine VAP. | ||
| vapMachineBindingName = "vsphere-failure-domain-in-use-by-machine" | ||
|
|
||
| // vapCPMSBindingName is the name of the binding for the CPMS VAP. | ||
| vapCPMSBindingName = "vsphere-failure-domain-in-use-by-cpms" | ||
|
|
||
| // vapMachineSetBindingName is the name of the binding for the MachineSet VAP. | ||
| vapMachineSetBindingName = "vsphere-failure-domain-in-use-by-machineset" |
There was a problem hiding this comment.
Inconsistent casing on VAP here and duplication. Can we consolidate on one set of vars for the names
| { | ||
| Name: "is-vsphere-platform", | ||
| Expression: `has(object.spec.platformSpec) && | ||
| object.spec.platformSpec.type == "` + vspherePlatformType + `"`, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Using status is more reliable for a platform check normally for the infrastructure object
| Expression: `has(object.spec.platformSpec.vsphere) && | ||
| has(object.spec.platformSpec.vsphere.failureDomains) | ||
| ? object.spec.platformSpec.vsphere.failureDomains | ||
| : []`, |
There was a problem hiding this comment.
This is equivalent
| Expression: `has(object.spec.platformSpec.vsphere) && | |
| has(object.spec.platformSpec.vsphere.failureDomains) | |
| ? object.spec.platformSpec.vsphere.failureDomains | |
| : []`, | |
| Expression: `object.?spec.platformSpec.vsphere.failureDomains.orValue('[]')`, |
| Expression: `has(object.spec.platformSpec) && | ||
| object.spec.platformSpec.type == "` + vspherePlatformType + `"`, |
There was a problem hiding this comment.
This is equivalent
| Expression: `has(object.spec.platformSpec) && | |
| object.spec.platformSpec.type == "` + vspherePlatformType + `"`, | |
| Expression: `object.?spec.platformSpec.type.orValue("") == "` + vspherePlatformType + `"`, |
| { | ||
| // machineRegion: the region label of the Machine param (empty string if absent). | ||
| Name: "machineRegion", | ||
| Expression: `has(params.metadata.labels) && "` + machineRegionLabel + `" in params.metadata.labels ? params.metadata.labels["` + machineRegionLabel + `"] : ""`, |
There was a problem hiding this comment.
Use the optional pattern as I've shown in the previous examples to tidy this up too
| fd.region == variables.machineRegion && fd.zone == variables.machineZone | ||
| )`, | ||
| MessageExpression: `"Infrastructure update would remove vSphere failure domain (region=" + variables.machineRegion + ", zone=" + variables.machineZone + ") that is still in use by Machine '" + params.metadata.name + "'"`, | ||
| Reason: ptr.To(metav1.StatusReasonForbidden), |
There was a problem hiding this comment.
Forbidden implies that another user could action this (it's a permissions issue), so use invalid instead
SPLAT-2790
Changes
Summary by CodeRabbit