Skip to content

OCPBUGS-88732: Treat missing cloud config outages as transient#472

Open
RadekManak wants to merge 1 commit into
openshift:mainfrom
RadekManak:regression-ci-failure-investigation
Open

OCPBUGS-88732: Treat missing cloud config outages as transient#472
RadekManak wants to merge 1 commit into
openshift:mainfrom
RadekManak:regression-ci-failure-investigation

Conversation

@RadekManak

@RadekManak RadekManak commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • keep missing source cloud configmaps on the transient retry path instead of immediately degrading CCCMO
  • preserve terminal behavior for real bad-key misconfigurations in infra.spec.cloudConfig
  • add a controller regression test that covers source config disappearance and recovery

Test plan

  • KUBEBUILDER_ASSETS=\"$(pwd)/$(go run ./vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest use 1.33.2 -p path --bin-dir ./bin --index https://raw.githubusercontent.com/openshift/api/master/envtest-releases.yaml)\" go test ./pkg/controllers

Summary by CodeRabbit

  • Bug Fixes

    • Cloud config sync controller now cleanly handles missing source ConfigMap by either initializing with platform-specific defaults (for managed platforms) or returning appropriate error status via ClusterOperator.
  • Tests

    • Added test case verifying controller properly handles missing infrastructure cloud ConfigMap scenario with correct ClusterOperator status reporting.

Keep brief source ConfigMap outages on the retry path so CCCMO does not immediately degrade during recovery, while preserving terminal behavior for real bad-key misconfigurations.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 500f907f-f6e4-4135-9412-efda1f17fa72

📥 Commits

Reviewing files that changed from the base of the PR and between d3f8b6a and b53992f.

📒 Files selected for processing (2)
  • pkg/controllers/cloud_config_sync_controller.go
  • pkg/controllers/cloud_config_sync_controller_test.go

Walkthrough

In CloudConfigReconciler.Reconcile, the not-found path for the source cloud-config ConfigMap is split: if the platform is managed by CCCMO, sourceCM.Data is initialized with a minimal platform config; otherwise, a terminal error is returned. A new test covers the AWS case for this missing-ConfigMap path.

Changes

Missing source ConfigMap handling

Layer / File(s) Summary
Reconcile bifurcation and test coverage
pkg/controllers/cloud_config_sync_controller.go, pkg/controllers/cloud_config_sync_controller_test.go
The not-found branch in Reconcile now initializes sourceCM.Data when shouldManageManagedConfigMap is true, or returns a terminal error otherwise. A new test deletes the AWS infra cloud ConfigMap, expects reconcile failure and absent ClusterOperator, then recreates it and expects success with cloudConfigControllerDegraded=false and cloudConfigControllerAvailable=true.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Container-Privileges ❌ Error The PR adds manifests/0000_26_cloud-controller-manager-operator_11_deployment.yaml with hostNetwork: true, a privileged setting flagged by the check. Review and document the justification for hostNetwork: true in the deployment manifest if needed for cloud controller manager operation.
Test Structure And Quality ⚠️ Warning The new test lacks assertion messages on critical assertions (lines 561, 569, 572-573), violating requirement #4. Similar tests in the codebase include messages like "transient error should be retu... Add meaningful failure messages to Expect assertions at lines 561, 569, 572-573, following the pattern used in other tests like those at lines 902 and 907.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: treating missing cloud config as transient rather than terminal, which is the core modification to CCCMO's retry behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names in the PR are stable and deterministic. The new test "should treat a missing source configmap as transient until it reappears" and all existing tests use descriptive static st...
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests were added. The PR only adds controller unit/integration tests in pkg/controllers/ using envtest, not e2e tests. The check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds tests to pkg/controllers/cloud_config_sync_controller_test.go, which is a controller unit/integration test file, not an e2e test. The custom check applies to e2e tests only.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only controller reconciliation logic in cloud_config_sync_controller.go (+2) and adds tests (+25). No deployment manifests, scheduling constraints, affinity rules, PodDisruptionBudgets,...
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations found. Changes are limited to a Reconcile() method (which uses fmt.Errorf, not stdout) and a test case within an It() block (stdout intercepted by Ginkgo).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The test added is a Ginkgo unit/integration test using envtest, not an e2e test. The check applies only to e2e tests. Even if applicable, it contains no IPv4 hardcoded addresses, IPv4-only parsing,...
No-Weak-Crypto ✅ Passed PR introduces no weak cryptography, custom crypto implementations, or non-constant-time secret comparisons. Changes initialize cloud config ConfigMaps with platform-specific defaults using standard...
No-Sensitive-Data-In-Logs ✅ Passed All logging statements (20 total) log only non-sensitive metadata: platform types, generic messages, and ConfigMap names/namespaces. No passwords, tokens, API keys, config content, or PII are expos...
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@RadekManak: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test fmt
/test images
/test lint
/test okd-scos-images
/test unit
/test vendor
/test verify-deps
/test vet

The following commands are available to trigger optional jobs:

/test e2e-aws-ovn-techpreview
/test e2e-azure-manual-oidc
/test e2e-azure-ovn
/test e2e-azure-ovn-upgrade
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-ibmcloud-ovn
/test e2e-nutanix-ovn
/test e2e-openstack-ovn
/test e2e-vsphere-ovn
/test level0-clusterinfra-azure-ipi-proxy-tests
/test okd-scos-e2e-aws-ovn
/test regression-clusterinfra-vsphere-ipi-ccm

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-e2e-aws-ovn-upgrade
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-fmt
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-level0-clusterinfra-azure-ipi-proxy-tests
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-lint
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-okd-scos-images
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-unit
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vendor
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-verify-deps
pull-ci-openshift-cluster-cloud-controller-manager-operator-main-vet
Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot requested review from nrb and racheljpg June 16, 2026 13:10
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mdbooth for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RadekManak RadekManak changed the title Treat missing cloud config outages as transient OCPBUGS-88732: Treat missing cloud config outages as transient Jun 16, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 16, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@RadekManak: This pull request references Jira Issue OCPBUGS-88732, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • keep missing source cloud configmaps on the transient retry path instead of immediately degrading CCCMO
  • preserve terminal behavior for real bad-key misconfigurations in infra.spec.cloudConfig
  • add a controller regression test that covers source config disappearance and recovery

Test plan

  • KUBEBUILDER_ASSETS=\"$(pwd)/$(go run ./vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest use 1.33.2 -p path --bin-dir ./bin --index https://raw.githubusercontent.com/openshift/api/master/envtest-releases.yaml)\" go test ./pkg/controllers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot

Copy link
Copy Markdown

@RadekManak: This pull request references Jira Issue OCPBUGS-88732, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary

  • keep missing source cloud configmaps on the transient retry path instead of immediately degrading CCCMO
  • preserve terminal behavior for real bad-key misconfigurations in infra.spec.cloudConfig
  • add a controller regression test that covers source config disappearance and recovery

Test plan

  • KUBEBUILDER_ASSETS=\"$(pwd)/$(go run ./vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest use 1.33.2 -p path --bin-dir ./bin --index https://raw.githubusercontent.com/openshift/api/master/envtest-releases.yaml)\" go test ./pkg/controllers

Summary by CodeRabbit

  • Bug Fixes

  • Cloud config sync controller now cleanly handles missing source ConfigMap by either initializing with platform-specific defaults (for managed platforms) or returning appropriate error status via ClusterOperator.

  • Tests

  • Added test case verifying controller properly handles missing infrastructure cloud ConfigMap scenario with correct ClusterOperator status reporting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@RadekManak: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit b53992f link true /test unit
ci/prow/e2e-aws-ovn b53992f link true /test e2e-aws-ovn
ci/prow/level0-clusterinfra-azure-ipi-proxy-tests b53992f link false /test level0-clusterinfra-azure-ipi-proxy-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants