Skip to content

add component-specific pod anti-affinity to aggregator and server deployments#224

Open
pinkneyj wants to merge 1 commit into
Cloudzero:developfrom
pinkneyj:develop
Open

add component-specific pod anti-affinity to aggregator and server deployments#224
pinkneyj wants to merge 1 commit into
Cloudzero:developfrom
pinkneyj:develop

Conversation

@pinkneyj

@pinkneyj pinkneyj commented Jul 1, 2026

Copy link
Copy Markdown

By submitting a PR to this repository, you agree to the terms within the CloudZero Code of Conduct. Please see the contributing guidelines for how to create and submit a high-quality PR for this repo.

Please note that changes to the cloudzero-agent Helm chart should be made instead in the helm directory in the cloudzero-agent repository, and will automatically be mirrored to this repository as soon as they are merged.

Description

Describe the purpose of this PR along with any background information and the impacts of the proposed change. For the benefit of the community, please do not assume prior context.

Provide details that support your chosen implementation, including: breaking changes, alternatives considered, changes to the API, etc.

If the UI is being changed, please provide screenshots.

The aggregator-deploy.yaml template passed only defaults.affinity to the generateAffinity helper, with no component-specific anti-affinity logic. This caused any label set in defaults.affinity to be applied to all components indiscriminately — for example, setting an aggregator-targeted matchLabels in defaults would incorrectly propagate that label selector to the server deployment, making the server avoid nodes running aggregator pods rather than other server pods.

This change brings aggregator-deploy.yaml and agent-deploy.yaml (server) in line with the existing pattern used in webhook-deploy.yaml: each template now constructs a $podAntiAffinity variable with its own component-specific app.kubernetes.io/name label, merges it with any user-supplied affinity from values, and passes the result to generateAffinity. This ensures each component's pod anti-affinity rule targets the correct peer pods, and that defaults.affinity is no longer required to carry component-specific label selectors.

No breaking changes. The aggregator.affinity and server.affinity values fields continue to work as user overrides.

References

Include any links supporting this change such as a:

  • GitHub Issue/PR number addressed or fixed
  • StackOverflow post
  • Support forum thread
  • Related pull requests/issues from other repos

If there are no references, simply delete this section.

Pattern sourced from existing webhook-deploy.yaml implementation (lines 106–108)

Testing

Describe how this can be tested by reviewers. Be specific about anything not tested and reasons why. If this library has unit and/or integration testing, tests should be added for new functionality and existing tests should complete without errors.

Please include any manual steps for testing end-to-end or functionality not covered by unit/integration tests.

Also include details of the environment this PR was developed in (language/platform/browser version).

Tested by deploying the Helm chart to a Kubernetes cluster and verifying:

  • Aggregator pods have matchLabels: app.kubernetes.io/name: aggregator in their anti-affinity spec

  • Server pods have matchLabels: app.kubernetes.io/name: server in their anti-affinity spec

  • Neither component inherits the other's label selector

  • defaults.affinity can be set to {} without affecting component-specific anti-affinity behaviour

  • labels can be manually set for one component without others being changed eg app.kubernetes.io/part-of: cloudzero-agent set for aggregator while others remain unchanged

  • This change adds test coverage for new/changed/fixed functionality

Checklist

  • I have added documentation for new/changed functionality in this PR
  • All active GitHub checks for tests, formatting, and security are passing
  • The correct base branch is being used, if not main

@pinkneyj pinkneyj requested a review from a team as a code owner July 1, 2026 08:58
@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes component-specific pod anti-affinity for the aggregator and server deployments by introducing a local $podAntiAffinity variable with the correct app.kubernetes.io/name label selector for each component, merging it with any user-supplied affinity before passing the result to generateAffinity. This brings both templates in line with the existing webhook-deploy.yaml pattern.

  • aggregator-deploy.yaml: now sets matchLabels: app.kubernetes.io/name: aggregator, which correctly matches what generateLabels emits for aggregator pods (verified in _helpers.tpl).
  • agent-deploy.yaml: now sets matchLabels: app.kubernetes.io/name: server, matching the server pod template labels.
  • Both files use deepCopy + | default (dict) on user-supplied affinity — slightly more robust than the webhook template, which skips the default guard. No unit tests were added to cover the new default podAntiAffinity behaviour.

Confidence Score: 4/5

Safe to merge; the affinity logic is correct and the label values match what the pods actually get at runtime.

The core change is sound — app.kubernetes.io/name values in the new matchLabels are confirmed to match what generateLabels emits for each component, merge order correctly lets user affinity win, and deepCopy prevents mutation of chart values. The only note is a pre-existing copy-paste in aggregator-deploy.yaml where topologySpreadConstraints and terminationGracePeriodSeconds still reference .Values.server.* instead of .Values.aggregator.*; this file is being touched, making it a natural time to clean up, but it does not affect the new affinity logic.

aggregator-deploy.yaml — pre-existing wrong values references at lines 183–187 are worth fixing while the file is open

Important Files Changed

Filename Overview
charts/cloudzero-agent/templates/aggregator-deploy.yaml Adds component-specific podAntiAffinity targeting app.kubernetes.io/name=aggregator; contains a pre-existing reference to .Values.server.topologySpreadConstraints and .Values.server.terminationGracePeriodSeconds that should be .Values.aggregator.*
charts/cloudzero-agent/templates/agent-deploy.yaml Adds component-specific podAntiAffinity targeting app.kubernetes.io/name=server; labels match what generateLabels emits for server pods; merge order and deepCopy usage are correct

Comments Outside Diff (1)

  1. charts/cloudzero-agent/templates/aggregator-deploy.yaml, line 183-187 (link)

    P2 Pre-existing wrong values keys in aggregator deployment

    Lines 183 and 187 reference .Values.server.topologySpreadConstraints and .Values.server.terminationGracePeriodSeconds inside the aggregator Deployment's pod spec. This means the aggregator picks up the server's topology constraints and termination grace period instead of its own. This is pre-existing, but since this file is being modified now, it's a good opportunity to correct the references to .Values.aggregator.topologySpreadConstraints and .Values.aggregator.terminationGracePeriodSeconds (or at least verify that aggregator-scoped values fields exist in values.yaml).

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "add component-specific pod anti-affinity..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant