feat(rbac): complete namespaced mode to reduce cluster-wide permissions#1999
feat(rbac): complete namespaced mode to reduce cluster-wide permissions#1999joshuabaird wants to merge 7 commits into
Conversation
Extends the existing --watch-namespaces support so the operator can run without cluster-wide RBAC on the resources it manages: - Generalize MakeScopedRBACObjects/Names to take a component so the collector and fluentd controllers can create namespaced Role/RoleBinding (previously only fluent-bit honored namespaced mode) - Add a Namespaced flag to the Collector and Fluentd reconcilers, wired to --watch-namespaces, mirroring the FluentBit controller - Clean up the per-instance (Cluster)RoleBinding on deletion (fills the existing TODOs); the shared (Cluster)Role is intentionally preserved - Expose operator.watchNamespaces as a first-class Helm value Refs fluent#1281 Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
2423905 to
00918b6
Compare
There was a problem hiding this comment.
Pull request overview
This PR completes the operator’s opt-in “namespaced mode” so that, when --watch-namespaces is used, the operator both (1) scopes its controller-runtime cache and (2) creates namespaced RBAC (Role/RoleBinding) for the managed agents (Fluent Bit / Collector / Fluentd), reducing the need for cluster-wide permissions on namespaced resources.
Changes:
- Generalizes the scoped RBAC helper to be component-aware and to apply per-CR additional RBAC rules.
- Extends namespaced RBAC behavior to Collector and Fluentd controllers (previously only Fluent Bit honored it) and adds delete-time cleanup for per-instance bindings.
- Exposes
operator.watchNamespacesin the Helm chart to render--watch-namespaces=....
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/operator/rbac.go | Generalizes scoped RBAC object/name builders to include component and additionalRules. |
| controllers/fluentd_controller.go | Adds Namespaced mode handling for Fluentd RBAC creation (Role/RoleBinding vs ClusterRole/ClusterRoleBinding). |
| controllers/fluentbit_controller.go | Updates FluentBit to use component-aware scoped RBAC names and adds binding cleanup on delete. |
| controllers/fluent_controller_finalizer.go | Adds delete-time cleanup of per-instance (Cluster)RoleBinding for Fluentd and mutators for Role/RoleBinding. |
| controllers/collector_controller.go | Adds Namespaced mode handling and binding cleanup on delete for Collector RBAC. |
| cmd/fluent-manager/main.go | Wires namespacedController into Collector and Fluentd reconcilers based on --watch-namespaces. |
| charts/fluent-operator/values.yaml | Adds operator.watchNamespaces value and documents behavior. |
| charts/fluent-operator/templates/fluent-operator-deployment.yaml | Renders --watch-namespaces={{ .Values.operator.watchNamespaces }} when set. |
Comments suppressed due to low confidence (1)
controllers/fluentbit_controller.go:209
- In namespaced mode,
MakeScopedRBACNamesnow makes the Role namefluent-operator-fluent-bit(shared across all FluentBit instances in the namespace). Setting a controller owner reference on this shared Role means deleting one FluentBit CR can garbage-collect the Role and break other instances. This also contradicts the delete logic which intentionally preserves the shared Role.
o.Rules = expected.Rules
if err := ctrl.SetControllerReference(fb, o, r.Scheme); err != nil {
return err
}
return nil
Collapse the duplicated namespaced-vs-cluster RBAC branching and the per-instance binding deletion across the FluentBit, Collector, and Fluentd controllers into operator.MakeRBACObjectsForScope and operator.DeletePerInstanceBinding. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
Wrap the MakeScopedRBACObjects and DeletePerInstanceBinding call sites that exceeded the 120-character line limit. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
The cluster-scoped ClusterRoleBinding references a shared ClusterRole and the operator is not granted delete on cluster-scoped RBAC, so attempting to delete it during finalization failed (forbidden), blocking the finalizer and leaving owned resources (e.g. the Fluentd StatefulSet) uncollected. Restore the prior cluster-mode behavior of leaving the ClusterRoleBinding in place and only delete the per-instance RoleBinding in namespaced mode. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
controllers/fluentbit_controller.go:206
- In namespaced mode the RBAC Role name is shared per namespace (MakeRBACNames uses only the component), so setting a controller reference on it will either (1) fail reconciliation when a second FluentBit instance tries to take ownership, or (2) cause the shared Role to be garbage-collected when the owning FluentBit is deleted, breaking siblings. The Role should not have a per-instance controller reference (only the RoleBinding/ServiceAccount should).
case *rbacv1.Role:
expected, _, _ := operator.MakeScopedRBACObjects(
fb.Name,
fb.Namespace,
"fluent-bit",
fb.Spec.RBACRules,
fb.Spec.ServiceAccountAnnotations,
)
return func() error {
o.Rules = expected.Rules
if err := ctrl.SetControllerReference(fb, o, r.Scheme); err != nil {
return err
}
return nil
}
In namespaced mode the FluentBit Role (fluent-operator-fluent-bit) is shared across all FluentBit instances in a namespace. Setting a per-instance controller reference on it would fail reconciliation for a second instance or garbage-collect the shared Role when one instance is deleted, breaking siblings. Drop the owner reference to match the Collector and Fluentd Role handling and the delete logic that preserves the shared Role. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
The controllers only delete the per-instance RoleBinding (via DeletePerInstanceBinding) in namespaced mode and never delete the shared Role, so granting delete on roles is unnecessary. Split the kubebuilder markers for fluent-bit, collector, and fluentd so roles keeps create;get;list;watch;patch while rolebindings retains delete, and regenerate config/rbac/role.yaml and manifests/setup/setup.yaml. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
Render the flag via printf|quote so YAML quoting is applied to values that need it (e.g. a space-separated "ns1, ns2" list), without embedding literal quotes into the container argv. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Josh Baird <jbaird@galileo.io>
| resources: | ||
| - clusterrolebindings | ||
| - clusterroles | ||
| - roles |
There was a problem hiding this comment.
Looks odd, why do we have roles here and remove it below?
Shouldn't this be clusterroles and clusterrolebindings here and below stick with the roles and rolebindings?
There was a problem hiding this comment.
controller-gen groups rules by their verb set rather than by resource scope. roles, clusterroles, and clusterrolebindings share create;get;list;watch;patch, so they collapse into one rule; rolebindings is broken out because it's the only resource that also needs delete.
The operator only deletes the per-instance RoleBinding in namespaced mode (DeletePerInstanceBinding) and never deletes Roles/ClusterRoles/ClusterRoleBindings see the doc comment on that function — which is why delete is scoped to rolebindings only (from the earlier least-privilege review in 56585f0).
Grouping by scope instead would mean either granting delete on roles or dropping it from rolebindings.
|
@joshuabaird it seems like the rbac role is only for filter kubernetes to work, maybe we can take a step back - we don't need this in the operator, the rbac related stuff should be took care of by the users, not the operator. What do you think? |
Summary
Addresses #1281 by completing the operator's namespaced mode so it can run without cluster-wide RBAC on the resources it manages.
The operator already had a
--watch-namespacesflag that scopes the controller-runtime cache (eliminating the... is forbidden ... at the cluster scopelist/watch errors) and aNamespacedmode for the FluentBit controller that creates namespacedRole/RoleBindinginstead ofClusterRole/ClusterRoleBinding. This PR extends that to the rest of the operator and makes it easy to enable.This is opt-in; cluster scope remains the default, so existing installs are unaffected.
Changes
pkg/operator/rbac.go):MakeScopedRBACObjects/MakeScopedRBACNamesnow take acomponentargument and applyadditionalRules, so any agent can get namespaced RBAC. Scoped names follow the samefluent-operator-<component>convention as the cluster-scoped objects. A smallMakeRBACObjectsForScopehelper centralizes the namespaced-vs-cluster selection used by all three controllers.Namespacedflag (wired to--watch-namespaces) so they createRole/RoleBindinginstead ofClusterRole/ClusterRoleBinding. Previously only FluentBit honored namespaced mode.RoleBindingis now removed when an agent CR is deleted (via the sharedDeletePerInstanceBindinghelper). In cluster mode the per-instanceClusterRoleBindingis left in place (unchanged from prior behavior): the operator is intentionally not granteddeleteon cluster-scoped RBAC, keeping its footprint minimal. The shared(Cluster)Roleis always preserved, since deleting it would break sibling instances.config/rbac/role.yaml,manifests/setup/setup.yaml): since the operator only deletes the per-instanceRoleBindingand never the sharedRole, the kubebuilder markers now grantrolesonlycreate;get;list;watch;patch(nodelete), keepingdeleteonrolebindings.operator.watchNamespacesas a first-class value that renders--watch-namespaces=...(defaults to empty / cluster scope).Notes / follow-ups (not in this PR)
ClusterRolein the Helm chart is unchanged; a namespaced operatorRoleoption is a sensible follow-up. (The chart already grants neitherroles:deletenorrolebindings:delete; note that withoutrolebindings:deletea Helm-deployed operator can't clean up per-instanceRoleBindings in namespaced mode.)ClusterRoleBindingin cluster mode would require granting the operatordeleteonclusterrolebindings; deferred here to keep this PR focused on reducing (not expanding) cluster-scoped permissions.ClusterInput/ClusterOutput/etc.) still require cluster-scoped watches.Test plan
go build ./...go vet ./...on changed packagesgo test ./pkg/operator/... ./controllers/...helm template --set operator.watchNamespaces=loggingrenders--watch-namespaces=logging; default omits itconfig/rbac/role.yamlandmanifests/setup/setup.yamlfrom the updated kubebuilder markers (make manifests); the only change isrolesdropping thedeleteverbe2e testsandhelm e2e testspass (cluster-mode lifecycle, incl. finalizer cleanup)--watch-namespacesand confirm namespacedRole/RoleBindingare created for fluent-bit/collector/fluentd and no cluster-scope list/watch errors appearMade with Cursor