OTA-1956: testing install: Split multi-document YAML and add missing annotations#1398
OTA-1956: testing install: Split multi-document YAML and add missing annotations#1398jhadvig wants to merge 1 commit into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new cluster-update-console-plugin, threads ImageStream refs into manifest rendering, creates namespace/network policy, Deployment and Service, adds Prometheus ServiceMonitor and alerting rules, and registers the plugin with the OpenShift console. ChangesCluster Update Console Plugin
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jhadvig The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml (1)
44-51: 💤 Low valueConsider adding a
for:duration to prevent transient alerts.The
UpdateAvailablealert fires immediately whencluster_version_available_updates > 0. While this may be intentional for prompt notification, a shortfor:duration (e.g., 5m) would prevent transient fluctuations from generating alert noise.♻️ Suggested addition
expr: | sum by (channel, namespace, upstream) (cluster_version_available_updates) > 0 + for: 5m labels: severity: info🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml` around lines 44 - 51, The UpdateAvailable PrometheusRule alert currently fires immediately; open the alert definition for alert: UpdateAvailable and add a for: duration (e.g., for: 5m) directly under the alert metadata so the rule only transitions to firing after the condition (sum by (channel, namespace, upstream) (cluster_version_available_updates) > 0) holds for that sustained period; update the alert block (alert: UpdateAvailable, expr: ...) to include the new for: field.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml`:
- Around line 13-16: The NetworkPolicy currently uses podSelector: {} with
policyTypes: - Ingress and - Egress which creates a default-deny for all pods;
update the policy to either target specific pods instead of podSelector: {}
(e.g., use a labelSelector matching the console-plugin pods) or add explicit
allow rules for required traffic (ingress rules to permit backend/monitoring
sources and egress rules to permit upstream destinations) so the console plugin
service and monitoring can communicate; reference the podSelector and
policyTypes (Ingress, Egress) in the NetworkPolicy being modified and ensure the
new rules explicitly allow the necessary CIDRs/namespaceSelectors/port/protocols
for plugin traffic.
In `@install/0000_50_cluster-update-console-plugin_50_deployment.yaml`:
- Around line 37-46: The plugin container currently defines resources.requests
but no resources.limits and its securityContext lacks readOnlyRootFilesystem;
update the container spec for the container named "plugin" to add a
resources.limits block (set cpu and memory limits equal to or higher than the
existing requests, e.g., cpu and memory) and set
securityContext.readOnlyRootFilesystem: true; ensure you modify the container
entry that contains resources.requests and securityContext so both limits and
readOnlyRootFilesystem are present and compliant.
- Around line 30-37: The pod spec for the container named "plugin" is missing
liveness and readiness probes; add both probes under the container "plugin" that
use the existing port/name (https / containerPort 9001) — for example configure
an httpGet (or tcpSocket) probe hitting a health endpoint (e.g. /health or
/ready) on port 9001, set sensible initialDelaySeconds, periodSeconds and
failureThreshold values, and include identical or appropriately different probe
settings for liveness (to restart unhealthy containers) and readiness (to gate
service traffic) so the deployment meets the Liveness + readiness probes
requirement.
In `@install/0000_50_cluster-update-console-plugin_60_service.yaml`:
- Around line 6-11: The Service manifest for the cluster-update console plugin
is missing the serving-cert request annotation; add the annotation key
service.beta.openshift.io/serving-cert-secret-name with value
"cluster-update-console-plugin-cert" to the Service's metadata.annotations so
the cluster will provision the same secret that the Deployment mounts
(secretName: cluster-update-console-plugin-cert), ensuring the TLS secret is
created for the pods waiting on /var/cert.
In `@install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml`:
- Around line 1-27: The ServiceMonitor in this manifest uses the same
metadata.name and namespace as the cluster-version-operator ServiceMonitor,
causing the later 0000_90 manifest to overwrite annotations; fix by giving this
console plugin ServiceMonitor a unique metadata.name (e.g., append "-console" or
similar) and ensure any related references (labels/selectors like
metadata.labels k8s-app and selector.matchLabels) remain correct, or
alternatively merge the annotations (capability.openshift.io/name and
release.openshift.io/feature-set) into the later ServiceMonitor so they are not
lost when 0000_90 is applied.
In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml`:
- Around line 1-13: This PrometheusRule duplicates an existing PrometheusRule
with metadata.name "cluster-version-operator" in namespace
"openshift-cluster-version" and lacks runbookUrl fields and proper
last_over_time wrapping; remove or rename this duplicate resource (or merge its
alerts into the existing PrometheusRule) so only one PrometheusRule with kind
PrometheusRule and name "cluster-version-operator" exists for that namespace,
and update each alert expression to wrap range queries with
last_over_time(...[5m]) and add appropriate runbookUrl annotations for each
alert to preserve Console/feature-set gating annotations (ensure the resource
that remains includes the Console gating annotations shown in
metadata.annotations).
In `@install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml`:
- Line 20: The manifest sets backend.service.port to the string "https" but
ConsolePluginService.port is an int32 and the actual Service listens on 9001;
change the backend.service.port value from "https" to the numeric 9001 so it
matches ConsolePluginService.port and the Service target port. Update the field
named backend.service.port in the ConsolePlugin YAML to the integer 9001 to
align with the Service configuration.
- Around line 18-19: The ConsolePlugin manifest sets spec.backend.service.name
to cluster-update-console-plugin which doesn't match the actual Service resource
name openshift-cluster-update-console-plugin; update the ConsolePlugin's
spec.backend.service.name to "openshift-cluster-update-console-plugin" (or
alternatively rename the Service to "cluster-update-console-plugin") so the
ConsolePlugin backend service name and the Service resource name match; check
the ConsolePlugin field spec.backend.service.port.targetPort and the Service
port too to ensure ports align after renaming.
---
Nitpick comments:
In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml`:
- Around line 44-51: The UpdateAvailable PrometheusRule alert currently fires
immediately; open the alert definition for alert: UpdateAvailable and add a for:
duration (e.g., for: 5m) directly under the alert metadata so the rule only
transitions to firing after the condition (sum by (channel, namespace, upstream)
(cluster_version_available_updates) > 0) holds for that sustained period; update
the alert block (alert: UpdateAvailable, expr: ...) to include the new for:
field.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a03fc91b-d22b-41d8-8ed5-8cf949246c0e
📒 Files selected for processing (9)
install/0000_50_cluster-update-console-plugin_10_namespace.yamlinstall/0000_50_cluster-update-console-plugin_20_networkpolicy.yamlinstall/0000_50_cluster-update-console-plugin_50_deployment.yamlinstall/0000_50_cluster-update-console-plugin_60_service.yamlinstall/0000_50_cluster-update-console-plugin_80_servicemonitor.yamlinstall/0000_50_cluster-update-console-plugin_81_prometheusrule.yamlinstall/0000_50_cluster-update-console-plugin_90_consoleplugin.yamlinstall/image-referencespkg/payload/render_test.go
| podSelector: {} | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress |
There was a problem hiding this comment.
Default-deny policy blocks plugin traffic without companion allow policies.
podSelector: {} + Ingress/Egress with no rules denies all traffic. As-is, backend access to the plugin service (and monitoring ingress) will be blocked unless additional allow policies are added in this namespace.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml` around
lines 13 - 16, The NetworkPolicy currently uses podSelector: {} with
policyTypes: - Ingress and - Egress which creates a default-deny for all pods;
update the policy to either target specific pods instead of podSelector: {}
(e.g., use a labelSelector matching the console-plugin pods) or add explicit
allow rules for required traffic (ingress rules to permit backend/monitoring
sources and egress rules to permit upstream destinations) so the console plugin
service and monitoring can communicate; reference the podSelector and
policyTypes (Ingress, Egress) in the NetworkPolicy being modified and ensure the
new rules explicitly allow the necessary CIDRs/namespaceSelectors/port/protocols
for plugin traffic.
| resources: | ||
| requests: | ||
| cpu: 20m | ||
| memory: 50Mi | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: | ||
| - ALL | ||
| terminationMessagePolicy: FallbackToLogsOnError |
There was a problem hiding this comment.
Add container limits and read-only root filesystem.
Container plugin has requests but no limits, and readOnlyRootFilesystem is not set.
Suggested patch
resources:
requests:
cpu: 20m
memory: 50Mi
+ limits:
+ cpu: 100m
+ memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
capabilities:
drop:
- ALLAs per coding guidelines, **/*.{yaml,yml} requires readOnlyRootFilesystem and Resource limits (cpu, memory) on every container.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resources: | |
| requests: | |
| cpu: 20m | |
| memory: 50Mi | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| capabilities: | |
| drop: | |
| - ALL | |
| terminationMessagePolicy: FallbackToLogsOnError | |
| resources: | |
| requests: | |
| cpu: 20m | |
| memory: 50Mi | |
| limits: | |
| cpu: 100m | |
| memory: 200Mi | |
| securityContext: | |
| allowPrivilegeEscalation: false | |
| readOnlyRootFilesystem: true | |
| capabilities: | |
| drop: | |
| - ALL | |
| terminationMessagePolicy: FallbackToLogsOnError |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@install/0000_50_cluster-update-console-plugin_50_deployment.yaml` around
lines 37 - 46, The plugin container currently defines resources.requests but no
resources.limits and its securityContext lacks readOnlyRootFilesystem; update
the container spec for the container named "plugin" to add a resources.limits
block (set cpu and memory limits equal to or higher than the existing requests,
e.g., cpu and memory) and set securityContext.readOnlyRootFilesystem: true;
ensure you modify the container entry that contains resources.requests and
securityContext so both limits and readOnlyRootFilesystem are present and
compliant.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/payload/render.go (1)
195-207: ⚡ Quick winConsider validating that image URIs are non-empty.
The function does not check whether
tag.From.Nameis non-empty before inserting into the map. If an ImageStream tag has an emptyDockerImagereference, templates using{{index .Images "tag-name"}}would render with an empty string, causing invalid manifest application.🛡️ Proposed validation
func imagesFromImageRef(imageRef *imagev1.ImageStream) map[string]string { images := make(map[string]string) if imageRef == nil { return images } for _, tag := range imageRef.Spec.Tags { - if tag.From != nil && tag.From.Kind == "DockerImage" { + if tag.From != nil && tag.From.Kind == "DockerImage" && tag.From.Name != "" { images[tag.Name] = tag.From.Name } } return images }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/payload/render.go` around lines 195 - 207, imagesFromImageRef currently inserts tag.From.Name into the images map without checking for emptiness; update imagesFromImageRef to skip tags where tag.From == nil or tag.From.Kind != "DockerImage" or tag.From.Name is empty (do not add entries with an empty URI), and optionally record/debug-log skipped tags if desired; reference the function imagesFromImageRef and the fields tag.From, tag.From.Kind and tag.From.Name when making the change.pkg/payload/render_test.go (1)
316-318: ⚡ Quick winConsider adding unit tests for
imagesFromImageRef.While
Test_cvoManifestsexercises the Images map indirectly through rendering, there's no direct unit test for theimagesFromImageRefhelper function. A table-driven test would document expected behavior and improve confidence for edge cases such as nil imageRef, empty tags, non-DockerImage kinds, and duplicate tag names.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/payload/render_test.go` around lines 316 - 318, Add a new table-driven unit test for the helper function imagesFromImageRef (e.g., Test_imagesFromImageRef) in pkg/payload/render_test.go to cover nil imageRef, empty tags, non-DockerImage kinds, and duplicate tag names; for each case construct an imageRef input, call imagesFromImageRef, and assert the returned map matches expected keys/values (including handling duplicates and empty tags), keeping Test_cvoManifests as-is but ensuring the new test directly verifies imagesFromImageRef behavior and edge cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/payload/render.go`:
- Around line 42-47: The code currently silences errors from loadImageReferences
causing renderConfig.Images to be empty and templates that reference .Images
(e.g., install/0000_50_cluster-update-console-plugin_50_deployment.yaml) to
render invalid empty image fields; update the logic around loadImageReferences
in pkg/payload/render.go so that if loadImageReferences(releaseManifestsDir)
returns an error you propagate/return that error from the rendering path (or
explicitly gate and return an error when templates require .Images), rather than
only logging a warning — ensure renderConfig.Images is only left unset when that
is acceptable and otherwise call imagesFromImageRef(imageRef) and fail fast when
image refs cannot be loaded.
---
Nitpick comments:
In `@pkg/payload/render_test.go`:
- Around line 316-318: Add a new table-driven unit test for the helper function
imagesFromImageRef (e.g., Test_imagesFromImageRef) in pkg/payload/render_test.go
to cover nil imageRef, empty tags, non-DockerImage kinds, and duplicate tag
names; for each case construct an imageRef input, call imagesFromImageRef, and
assert the returned map matches expected keys/values (including handling
duplicates and empty tags), keeping Test_cvoManifests as-is but ensuring the new
test directly verifies imagesFromImageRef behavior and edge cases.
In `@pkg/payload/render.go`:
- Around line 195-207: imagesFromImageRef currently inserts tag.From.Name into
the images map without checking for emptiness; update imagesFromImageRef to skip
tags where tag.From == nil or tag.From.Kind != "DockerImage" or tag.From.Name is
empty (do not add entries with an empty URI), and optionally record/debug-log
skipped tags if desired; reference the function imagesFromImageRef and the
fields tag.From, tag.From.Kind and tag.From.Name when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b86d3e40-c869-4f98-bc00-cff997f23625
📒 Files selected for processing (4)
install/0000_50_cluster-update-console-plugin_50_deployment.yamlpkg/payload/payload.gopkg/payload/render.gopkg/payload/render_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- install/0000_50_cluster-update-console-plugin_50_deployment.yaml
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@jhadvig: This pull request references OTA-1956 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
b2f6731 to
b4c19ca
Compare
92d27f2 to
8d5bce1
Compare
4128296 to
e2437ef
Compare
Add manifests to deploy the cluster-update-console-plugin, a web console
interface for managing ClusterVersion updates. The plugin is gated behind
the TechPreviewNoUpgrade feature set and the Console capability.
Manifests include: namespace, serviceaccount, networkpolicy, deployment,
service, and consoleplugin resources. The deployment uses readiness and
liveness probes, a read-only root filesystem with emptyDir volumes for
nginx runtime directories, and a serving-cert annotation for TLS.
Extend manifestRenderConfig with an Images map populated from the
release payload's image-references, so CVO manifests can resolve
component images by short name at deploy time using Go template
syntax: {{index .Images "cluster-update-console-plugin"}}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e2437ef to
da3faf1
Compare
Summary
Fixes the CI failures in #1388 caused by
0000_50_cluster-update-console-plugin_80_servicemonitor.yamlcontaining two YAML documents (ServiceMonitor + PrometheusRule) in a single file separated by---. The kube-apiserver manifest renderer cannot decode multi-document YAML, causing bootstrap to fail with:Changes:
_80_servicemonitor.yamland_81_prometheusrule.yaml)capability.openshift.io/name: Consoleandrelease.openshift.io/feature-set: TechPreviewNoUpgradeannotations to both resources, matching all other console-plugin manifests in this setTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores