Skip to content

feat(rbac): complete namespaced mode to reduce cluster-wide permissions#1999

Open
joshuabaird wants to merge 7 commits into
fluent:masterfrom
joshuabaird:chore/watch-namespaces
Open

feat(rbac): complete namespaced mode to reduce cluster-wide permissions#1999
joshuabaird wants to merge 7 commits into
fluent:masterfrom
joshuabaird:chore/watch-namespaces

Conversation

@joshuabaird

@joshuabaird joshuabaird commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Addresses #1281 by completing the operator's namespaced mode so it can run without cluster-wide RBAC on the resources it manages.

The operator already had a --watch-namespaces flag that scopes the controller-runtime cache (eliminating the ... is forbidden ... at the cluster scope list/watch errors) and a Namespaced mode for the FluentBit controller that creates namespaced Role/RoleBinding instead of ClusterRole/ClusterRoleBinding. This PR extends that to the rest of the operator and makes it easy to enable.

This is opt-in; cluster scope remains the default, so existing installs are unaffected.

Changes

  • Generalize the scoped RBAC builder (pkg/operator/rbac.go): MakeScopedRBACObjects/MakeScopedRBACNames now take a component argument and apply additionalRules, so any agent can get namespaced RBAC. Scoped names follow the same fluent-operator-<component> convention as the cluster-scoped objects. A small MakeRBACObjectsForScope helper centralizes the namespaced-vs-cluster selection used by all three controllers.
  • Collector & Fluentd controllers: add a Namespaced flag (wired to --watch-namespaces) so they create Role/RoleBinding instead of ClusterRole/ClusterRoleBinding. Previously only FluentBit honored namespaced mode.
  • Deletion cleanup: in namespaced mode the per-instance RoleBinding is now removed when an agent CR is deleted (via the shared DeletePerInstanceBinding helper). In cluster mode the per-instance ClusterRoleBinding is left in place (unchanged from prior behavior): the operator is intentionally not granted delete on cluster-scoped RBAC, keeping its footprint minimal. The shared (Cluster)Role is always preserved, since deleting it would break sibling instances.
  • Tighten the operator's own RBAC (config/rbac/role.yaml, manifests/setup/setup.yaml): since the operator only deletes the per-instance RoleBinding and never the shared Role, the kubebuilder markers now grant roles only create;get;list;watch;patch (no delete), keeping delete on rolebindings.
  • Helm chart: expose operator.watchNamespaces as a first-class value that renders --watch-namespaces=... (defaults to empty / cluster scope).

Notes / follow-ups (not in this PR)

  • The operator's own static ClusterRole in the Helm chart is unchanged; a namespaced operator Role option is a sensible follow-up. (The chart already grants neither roles:delete nor rolebindings:delete; note that without rolebindings:delete a Helm-deployed operator can't clean up per-instance RoleBindings in namespaced mode.)
  • Cleaning up the per-instance ClusterRoleBinding in cluster mode would require granting the operator delete on clusterrolebindings; deferred here to keep this PR focused on reducing (not expanding) cluster-scoped permissions.
  • Trade-offs to document separately: in namespaced mode the fluent-bit kubernetes metadata filter can only enrich pods in watched namespaces, and cluster-scoped CRDs (ClusterInput/ClusterOutput/etc.) still require cluster-scoped watches.

Test plan

  • go build ./...
  • go vet ./... on changed packages
  • go test ./pkg/operator/... ./controllers/...
  • helm template --set operator.watchNamespaces=logging renders --watch-namespaces=logging; default omits it
  • Regenerated config/rbac/role.yaml and manifests/setup/setup.yaml from the updated kubebuilder markers (make manifests); the only change is roles dropping the delete verb
  • CI e2e tests and helm e2e tests pass (cluster-mode lifecycle, incl. finalizer cleanup)
  • Manual e2e: deploy with --watch-namespaces and confirm namespaced Role/RoleBinding are created for fluent-bit/collector/fluentd and no cluster-scope list/watch errors appear

Made with Cursor

Copilot AI review requested due to automatic review settings June 18, 2026 17:08
Extends the existing --watch-namespaces support so the operator can run
without cluster-wide RBAC on the resources it manages:

- Generalize MakeScopedRBACObjects/Names to take a component so the
  collector and fluentd controllers can create namespaced Role/RoleBinding
  (previously only fluent-bit honored namespaced mode)
- Add a Namespaced flag to the Collector and Fluentd reconcilers, wired to
  --watch-namespaces, mirroring the FluentBit controller
- Clean up the per-instance (Cluster)RoleBinding on deletion (fills the
  existing TODOs); the shared (Cluster)Role is intentionally preserved
- Expose operator.watchNamespaces as a first-class Helm value

Refs fluent#1281

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>
@joshuabaird joshuabaird force-pushed the chore/watch-namespaces branch from 2423905 to 00918b6 Compare June 18, 2026 17:10
@joshuabaird joshuabaird marked this pull request as draft June 18, 2026 17:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the operator’s opt-in “namespaced mode” so that, when --watch-namespaces is used, the operator both (1) scopes its controller-runtime cache and (2) creates namespaced RBAC (Role/RoleBinding) for the managed agents (Fluent Bit / Collector / Fluentd), reducing the need for cluster-wide permissions on namespaced resources.

Changes:

  • Generalizes the scoped RBAC helper to be component-aware and to apply per-CR additional RBAC rules.
  • Extends namespaced RBAC behavior to Collector and Fluentd controllers (previously only Fluent Bit honored it) and adds delete-time cleanup for per-instance bindings.
  • Exposes operator.watchNamespaces in the Helm chart to render --watch-namespaces=....

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/operator/rbac.go Generalizes scoped RBAC object/name builders to include component and additionalRules.
controllers/fluentd_controller.go Adds Namespaced mode handling for Fluentd RBAC creation (Role/RoleBinding vs ClusterRole/ClusterRoleBinding).
controllers/fluentbit_controller.go Updates FluentBit to use component-aware scoped RBAC names and adds binding cleanup on delete.
controllers/fluent_controller_finalizer.go Adds delete-time cleanup of per-instance (Cluster)RoleBinding for Fluentd and mutators for Role/RoleBinding.
controllers/collector_controller.go Adds Namespaced mode handling and binding cleanup on delete for Collector RBAC.
cmd/fluent-manager/main.go Wires namespacedController into Collector and Fluentd reconcilers based on --watch-namespaces.
charts/fluent-operator/values.yaml Adds operator.watchNamespaces value and documents behavior.
charts/fluent-operator/templates/fluent-operator-deployment.yaml Renders --watch-namespaces={{ .Values.operator.watchNamespaces }} when set.
Comments suppressed due to low confidence (1)

controllers/fluentbit_controller.go:209

  • In namespaced mode, MakeScopedRBACNames now makes the Role name fluent-operator-fluent-bit (shared across all FluentBit instances in the namespace). Setting a controller owner reference on this shared Role means deleting one FluentBit CR can garbage-collect the Role and break other instances. This also contradicts the delete logic which intentionally preserves the shared Role.
			o.Rules = expected.Rules
			if err := ctrl.SetControllerReference(fb, o, r.Scheme); err != nil {
				return err
			}
			return nil

joshuabaird and others added 3 commits June 18, 2026 13:25
Collapse the duplicated namespaced-vs-cluster RBAC branching and the
per-instance binding deletion across the FluentBit, Collector, and Fluentd
controllers into operator.MakeRBACObjectsForScope and
operator.DeletePerInstanceBinding.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>
Wrap the MakeScopedRBACObjects and DeletePerInstanceBinding call sites
that exceeded the 120-character line limit.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>
The cluster-scoped ClusterRoleBinding references a shared ClusterRole and the
operator is not granted delete on cluster-scoped RBAC, so attempting to delete
it during finalization failed (forbidden), blocking the finalizer and leaving
owned resources (e.g. the Fluentd StatefulSet) uncollected. Restore the prior
cluster-mode behavior of leaving the ClusterRoleBinding in place and only
delete the per-instance RoleBinding in namespaced mode.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

controllers/fluentbit_controller.go:206

  • In namespaced mode the RBAC Role name is shared per namespace (MakeRBACNames uses only the component), so setting a controller reference on it will either (1) fail reconciliation when a second FluentBit instance tries to take ownership, or (2) cause the shared Role to be garbage-collected when the owning FluentBit is deleted, breaking siblings. The Role should not have a per-instance controller reference (only the RoleBinding/ServiceAccount should).
	case *rbacv1.Role:
		expected, _, _ := operator.MakeScopedRBACObjects(
			fb.Name,
			fb.Namespace,
			"fluent-bit",
			fb.Spec.RBACRules,
			fb.Spec.ServiceAccountAnnotations,
		)

		return func() error {
			o.Rules = expected.Rules
			if err := ctrl.SetControllerReference(fb, o, r.Scheme); err != nil {
				return err
			}
			return nil
		}

Comment thread charts/fluent-operator/templates/fluent-operator-deployment.yaml
In namespaced mode the FluentBit Role (fluent-operator-fluent-bit) is shared
across all FluentBit instances in a namespace. Setting a per-instance
controller reference on it would fail reconciliation for a second instance or
garbage-collect the shared Role when one instance is deleted, breaking
siblings. Drop the owner reference to match the Collector and Fluentd Role
handling and the delete logic that preserves the shared Role.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread controllers/collector_controller.go Outdated
Comment thread controllers/fluentd_controller.go Outdated
The controllers only delete the per-instance RoleBinding (via
DeletePerInstanceBinding) in namespaced mode and never delete the shared
Role, so granting delete on roles is unnecessary. Split the kubebuilder
markers for fluent-bit, collector, and fluentd so roles keeps
create;get;list;watch;patch while rolebindings retains delete, and
regenerate config/rbac/role.yaml and manifests/setup/setup.yaml.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comment thread charts/fluent-operator/templates/fluent-operator-deployment.yaml
@joshuabaird joshuabaird marked this pull request as ready for review June 18, 2026 21:41
Render the flag via printf|quote so YAML quoting is applied to values
that need it (e.g. a space-separated "ns1, ns2" list), without embedding
literal quotes into the container argv.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Josh Baird <jbaird@galileo.io>
resources:
- clusterrolebindings
- clusterroles
- roles

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks odd, why do we have roles here and remove it below?

Shouldn't this be clusterroles and clusterrolebindings here and below stick with the roles and rolebindings?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller-gen groups rules by their verb set rather than by resource scope. roles, clusterroles, and clusterrolebindings share create;get;list;watch;patch, so they collapse into one rule; rolebindings is broken out because it's the only resource that also needs delete.

The operator only deletes the per-instance RoleBinding in namespaced mode (DeletePerInstanceBinding) and never deletes Roles/ClusterRoles/ClusterRoleBindings see the doc comment on that function — which is why delete is scoped to rolebindings only (from the earlier least-privilege review in 56585f0).

Grouping by scope instead would mean either granting delete on roles or dropping it from rolebindings.

@cw-Guo

cw-Guo commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@joshuabaird it seems like the rbac role is only for filter kubernetes to work, maybe we can take a step back - we don't need this in the operator, the rbac related stuff should be took care of by the users, not the operator. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants