Restrict SONiC SNMP and gNMI access to the OOB management network#2338
Restrict SONiC SNMP and gNMI access to the OOB management network#2338berendt wants to merge 2 commits into
Conversation
The generated ConfigDB binds the SNMP agent to the OOB IP in the mgmt VRF but never restricts which source networks can reach it, and the gNMI/telemetry server from the static TELEMETRY|gnmi base entry listens on all interfaces with no restriction at all. Emit per-device control-plane ACLs handled by caclmgrd: ACL_TABLE entries of type CTRLPLANE bound to the SNMP and EXTERNAL_CLIENT (gNMI) services, each with an ACL_RULE accepting only the device's network-normalised OOB management subnet. Binding a service in a CTRLPLANE table makes caclmgrd install an implicit default-drop for it, so all other sources lose access. caclmgrd's EXTERNAL_CLIENT service carries no built-in destination port (verified against the ACL_SERVICES map in sonic-host-services 202211 through master): the rule must provide it via L4_DST_PORT, or caclmgrd skips the whole table with only a syslog warning. The GNMI_ONLY rule therefore sets L4_DST_PORT from TELEMETRY|gnmi|port, falling back to the telemetry container default 8080. ACL_TABLE and ACL_RULE become generator-owned on-demand tables: they are dropped from the base config up front and rebuilt only when the device has an IPv4 OOB IP, so stale entries cannot survive a regen and devices without an OOB IP get no ACL tables at all. The new tables are not yet covered by the generated YANG schemas, so the validator reports them as warnings (validation skipped), not errors. Closes #2330 AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
Patch _add_ctrlplane_acls in the orchestrator helper fixture like the other section helpers, so the orchestrator glue stays exercised in isolation, and pin the wiring for #2330: the helper runs exactly once with the OOB IP and prefix when one exists, and is not invoked when the device has no OOB IP — in which case the owned ACL_TABLE and ACL_RULE tables stay absent even if the base config carried stale entries. AI-assisted: Claude Code Signed-off-by: Christian Berendt <berendt@osism.tech>
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="osism/tasks/conductor/sonic/config_generator.py" line_range="2289-2295" />
<code_context>
+ TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
+ follows the same lookup against the (pass-through) TELEMETRY table.
+ """
+ gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {})
+ return str(gnmi_config.get("port", DEFAULT_GNMI_PORT))
+
+
</code_context>
<issue_to_address>
**suggestion:** Consider validating that the gNMI port is numeric before using it as L4_DST_PORT.
If `TELEMETRY|gnmi|port` is set to a non-numeric value, `L4_DST_PORT` will be invalid and caclmgrd may ignore or mis-handle the rule. Please validate that the port is numeric (int or numeric string) and either fall back to `DEFAULT_GNMI_PORT` or log a warning when it isn’t, so ACL behavior remains predictable under bad TELEMETRY config.
```suggestion
def _get_gnmi_port(config):
"""Return the gNMI server port for the GNMI_ONLY control-plane ACL rule.
The telemetry/gNMI container reads its listen port from
TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
follows the same lookup against the (pass-through) TELEMETRY table.
If TELEMETRY|gnmi|port is set to a non-numeric value, fall back to
DEFAULT_GNMI_PORT and log a warning so ACL behavior remains predictable.
"""
gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {})
port_value = gnmi_config.get("port")
# No port configured: use the default.
if port_value is None:
return DEFAULT_GNMI_PORT
# Accept both integer and numeric string configuration.
try:
port_int = int(port_value)
except (TypeError, ValueError):
logger.warning(
"Invalid TELEMETRY|gnmi|port=%r in config; falling back to default gNMI port %s",
port_value,
DEFAULT_GNMI_PORT,
)
return DEFAULT_GNMI_PORT
# Optionally guard against clearly invalid port ranges while keeping behavior predictable.
if not 0 < port_int < 65536:
logger.warning(
"Out-of-range TELEMETRY|gnmi|port=%d in config; falling back to default gNMI port %s",
port_int,
DEFAULT_GNMI_PORT,
)
return DEFAULT_GNMI_PORT
return str(port_int)
```
</issue_to_address>
### Comment 2
<location path="osism/tasks/conductor/sonic/config_generator.py" line_range="2322-2327" />
<code_context>
+ logs a warning and emits nothing rather than failing the whole config
+ generation.
+ """
+ network = ipaddress.ip_network(f"{oob_ip}/{prefix_len}", strict=False)
+ if network.version != 4:
+ logger.warning(
+ f"OOB IP {oob_ip}/{prefix_len} is not IPv4; skipping control-plane ACLs"
+ )
+ return
+
+ accept_from_oob = {
</code_context>
<issue_to_address>
**issue:** Guard against invalid OOB IPs raising ValueError during ip_network() construction.
If `oob_ip` or `prefix_len` are malformed, `ipaddress.ip_network` will raise `ValueError` and abort config generation. Consider wrapping this call in a `try/except ValueError`, logging a warning (similar to the non-IPv4 case), and returning early so bad input only skips ACL generation rather than breaking the whole config.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _get_gnmi_port(config): | ||
| """Return the gNMI server port for the GNMI_ONLY control-plane ACL rule. | ||
|
|
||
| The telemetry/gNMI container reads its listen port from | ||
| TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule | ||
| follows the same lookup against the (pass-through) TELEMETRY table. | ||
| """ |
There was a problem hiding this comment.
suggestion: Consider validating that the gNMI port is numeric before using it as L4_DST_PORT.
If TELEMETRY|gnmi|port is set to a non-numeric value, L4_DST_PORT will be invalid and caclmgrd may ignore or mis-handle the rule. Please validate that the port is numeric (int or numeric string) and either fall back to DEFAULT_GNMI_PORT or log a warning when it isn’t, so ACL behavior remains predictable under bad TELEMETRY config.
| def _get_gnmi_port(config): | |
| """Return the gNMI server port for the GNMI_ONLY control-plane ACL rule. | |
| The telemetry/gNMI container reads its listen port from | |
| TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule | |
| follows the same lookup against the (pass-through) TELEMETRY table. | |
| """ | |
| def _get_gnmi_port(config): | |
| """Return the gNMI server port for the GNMI_ONLY control-plane ACL rule. | |
| The telemetry/gNMI container reads its listen port from | |
| TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule | |
| follows the same lookup against the (pass-through) TELEMETRY table. | |
| If TELEMETRY|gnmi|port is set to a non-numeric value, fall back to | |
| DEFAULT_GNMI_PORT and log a warning so ACL behavior remains predictable. | |
| """ | |
| gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {}) | |
| port_value = gnmi_config.get("port") | |
| # No port configured: use the default. | |
| if port_value is None: | |
| return DEFAULT_GNMI_PORT | |
| # Accept both integer and numeric string configuration. | |
| try: | |
| port_int = int(port_value) | |
| except (TypeError, ValueError): | |
| logger.warning( | |
| "Invalid TELEMETRY|gnmi|port=%r in config; falling back to default gNMI port %s", | |
| port_value, | |
| DEFAULT_GNMI_PORT, | |
| ) | |
| return DEFAULT_GNMI_PORT | |
| # Optionally guard against clearly invalid port ranges while keeping behavior predictable. | |
| if not 0 < port_int < 65536: | |
| logger.warning( | |
| "Out-of-range TELEMETRY|gnmi|port=%d in config; falling back to default gNMI port %s", | |
| port_int, | |
| DEFAULT_GNMI_PORT, | |
| ) | |
| return DEFAULT_GNMI_PORT | |
| return str(port_int) |
| network = ipaddress.ip_network(f"{oob_ip}/{prefix_len}", strict=False) | ||
| if network.version != 4: | ||
| logger.warning( | ||
| f"OOB IP {oob_ip}/{prefix_len} is not IPv4; skipping control-plane ACLs" | ||
| ) | ||
| return |
There was a problem hiding this comment.
issue: Guard against invalid OOB IPs raising ValueError during ip_network() construction.
If oob_ip or prefix_len are malformed, ipaddress.ip_network will raise ValueError and abort config generation. Consider wrapping this call in a try/except ValueError, logging a warning (similar to the non-IPv4 case), and returning early so bad input only skips ACL generation rather than breaking the whole config.
Closes #2330
What this does
The generated SONiC ConfigDB bound the SNMP agent to the OOB IP in the mgmt VRF but never restricted which source networks can reach it, and the gNMI/telemetry server (static
TELEMETRY|gnmibase entry) listened on all interfaces with no restriction at all.This PR emits, per device with an OOB IP, control-plane ACLs handled by
caclmgrd:ACL_TABLEentries of typeCTRLPLANEbound to theSNMPandEXTERNAL_CLIENT(gNMI) services, each with anACL_RULEthat accepts only the device's network-normalised OOB management subnet. Binding a service in a CTRLPLANE table makes caclmgrd install an implicit default-drop for that service, so all other sources lose access.Commit walkthrough
_add_ctrlplane_aclshelper (the home for Ensure that the SONiC SSH service is accessible only on the management network #2329'sSSH_ONLYas well), registersACL_TABLE/ACL_RULEas generator-owned on-demand tables (dropped from the base config up front, rebuilt on every regen, absent without an OOB IP), wires the helper intogenerate_sonic_configinside the management-interface block, and covers the helper with unit tests (tables/rules emitted, network-normalisedSRC_IP,TELEMETRY|gnmi|porthandling, wholesale replacement of pre-existing entries, non-IPv4 skip, ownership registration)._add_ctrlplane_aclsin the orchestrator helper fixture like the other section helpers and pins the wiring: called exactly once with(config, oob_ip, prefix_len)when an OOB IP exists; not invoked (and the owned tables stay absent, including stale base-config entries) when there is none.Deviations from the issue
L4_DST_PORTadded to theGNMI_ONLYrule. The issue's example rule carries onlyPRIORITY/PACKET_ACTION/SRC_IP/IP_TYPE, and its note saysEXTERNAL_CLIENT"maps to the configuredTELEMETRY|gnmi|port". The issue asked to verify this against caclmgrd before relying on it — verification againstsonic-host-services(202211, 202305, 202311, 202405, master) shows noTELEMETRY/GNMIlookup exists in any of these versions:EXTERNAL_CLIENThas no built-in destination port, the rule must provide it viaL4_DST_PORT, and a CTRLPLANE table whose rules yield no destination port is skipped entirely (syslog warning only) — the issue's exact example would have been a silent no-op for gNMI. The generator therefore performs theTELEMETRY|gnmi|portmapping itself and writes it into the rule, falling back to the telemetry container default8080when the base config sets no port. The SNMP rule needs no port (fixed161in caclmgrd'sACL_SERVICES).IPv4Network(...)unconditionally;get_device_oob_ipcan legitimately return an IPv6 address (NetBoxoob_ipor a mgmt interface with only a v6 address), and an unguardedIPv4Networkwould abort the device's entire config generation. The skip path is logged and unit-tested.SSH_ONLYis not emitted here — that is Ensure that the SONiC SSH service is accessible only on the management network #2329's scope (still open). The helper is written and documented as the shared home for it; adding SSH is a one-entry change.sonic-aclYANG model added. The issue's "Consider adding..." relates to Cover Enterprise SONiC tables in the YANG model set #2258; the validator continues to reportACL_TABLE/ACL_RULEas warnings ("No YANG schema for table..."), not errors, exactly as the issue describes.Verification
pipenv run pytest— 1150 passed (full unit suite; includes the existing exhaustive owned-table stale-drop test, which automatically covers the two new owned tables)pipenv run flake8on the changed files — cleansonic-net/sonic-host-servicesbranches 202211 → master (ACL_SERVICES,EXTERNAL_CLIENTport handling,is_rule_ipv4, implicit drop)🤖 Generated with Claude Code