Skip to content

Restrict SONiC SNMP and gNMI access to the OOB management network#2338

Open
berendt wants to merge 2 commits into
mainfrom
implement/issue-2330-snmp-gnmi-ctrlplane-acls
Open

Restrict SONiC SNMP and gNMI access to the OOB management network#2338
berendt wants to merge 2 commits into
mainfrom
implement/issue-2330-snmp-gnmi-ctrlplane-acls

Conversation

@berendt

@berendt berendt commented Jun 10, 2026

Copy link
Copy Markdown
Member

Closes #2330

What this does

The generated SONiC ConfigDB bound the SNMP agent to the OOB IP in the mgmt VRF but never restricted which source networks can reach it, and the gNMI/telemetry server (static TELEMETRY|gnmi base entry) listened on all interfaces with no restriction at all.

This PR emits, per device with an OOB IP, control-plane ACLs handled by caclmgrd: ACL_TABLE entries of type CTRLPLANE bound to the SNMP and EXTERNAL_CLIENT (gNMI) services, each with an ACL_RULE that accepts only the device's network-normalised OOB management subnet. Binding a service in a CTRLPLANE table makes caclmgrd install an implicit default-drop for that service, so all other sources lose access.

Commit walkthrough

  1. Restrict SONiC SNMP and gNMI access to the OOB network — adds the shared _add_ctrlplane_acls helper (the home for Ensure that the SONiC SSH service is accessible only on the management network #2329's SSH_ONLY as well), registers ACL_TABLE/ACL_RULE as generator-owned on-demand tables (dropped from the base config up front, rebuilt on every regen, absent without an OOB IP), wires the helper into generate_sonic_config inside the management-interface block, and covers the helper with unit tests (tables/rules emitted, network-normalised SRC_IP, TELEMETRY|gnmi|port handling, wholesale replacement of pre-existing entries, non-IPv4 skip, ownership registration).
  2. Add orchestrator wiring tests for control-plane ACLs — patches _add_ctrlplane_acls in the orchestrator helper fixture like the other section helpers and pins the wiring: called exactly once with (config, oob_ip, prefix_len) when an OOB IP exists; not invoked (and the owned tables stay absent, including stale base-config entries) when there is none.

Deviations from the issue

  • L4_DST_PORT added to the GNMI_ONLY rule. The issue's example rule carries only PRIORITY/PACKET_ACTION/SRC_IP/IP_TYPE, and its note says EXTERNAL_CLIENT "maps to the configured TELEMETRY|gnmi|port". The issue asked to verify this against caclmgrd before relying on it — verification against sonic-host-services (202211, 202305, 202311, 202405, master) shows no TELEMETRY/GNMI lookup exists in any of these versions: EXTERNAL_CLIENT has no built-in destination port, the rule must provide it via L4_DST_PORT, and a CTRLPLANE table whose rules yield no destination port is skipped entirely (syslog warning only) — the issue's exact example would have been a silent no-op for gNMI. The generator therefore performs the TELEMETRY|gnmi|port mapping itself and writes it into the rule, falling back to the telemetry container default 8080 when the base config sets no port. The SNMP rule needs no port (fixed 161 in caclmgrd's ACL_SERVICES).
  • Non-IPv4 OOB IPs skip ACL generation with a warning instead of crashing. The issue assumed IPv4Network(...) unconditionally; get_device_oob_ip can legitimately return an IPv6 address (NetBox oob_ip or a mgmt interface with only a v6 address), and an unguarded IPv4Network would abort the device's entire config generation. The skip path is logged and unit-tested.
  • SSH_ONLY is not emitted here — that is Ensure that the SONiC SSH service is accessible only on the management network #2329's scope (still open). The helper is written and documented as the shared home for it; adding SSH is a one-entry change.
  • No sonic-acl YANG model added. The issue's "Consider adding..." relates to Cover Enterprise SONiC tables in the YANG model set #2258; the validator continues to report ACL_TABLE/ACL_RULE as warnings ("No YANG schema for table..."), not errors, exactly as the issue describes.

Verification

  • pipenv run pytest — 1150 passed (full unit suite; includes the existing exhaustive owned-table stale-drop test, which automatically covers the two new owned tables)
  • pipenv run flake8 on the changed files — clean
  • caclmgrd semantics verified against sonic-net/sonic-host-services branches 202211 → master (ACL_SERVICES, EXTERNAL_CLIENT port handling, is_rule_ipv4, implicit drop)

🤖 Generated with Claude Code

berendt added 2 commits June 10, 2026 09:26
The generated ConfigDB binds the SNMP agent to the OOB IP in the mgmt
VRF but never restricts which source networks can reach it, and the
gNMI/telemetry server from the static TELEMETRY|gnmi base entry
listens on all interfaces with no restriction at all.

Emit per-device control-plane ACLs handled by caclmgrd: ACL_TABLE
entries of type CTRLPLANE bound to the SNMP and EXTERNAL_CLIENT
(gNMI) services, each with an ACL_RULE accepting only the device's
network-normalised OOB management subnet. Binding a service in a
CTRLPLANE table makes caclmgrd install an implicit default-drop for
it, so all other sources lose access.

caclmgrd's EXTERNAL_CLIENT service carries no built-in destination
port (verified against the ACL_SERVICES map in sonic-host-services
202211 through master): the rule must provide it via L4_DST_PORT, or
caclmgrd skips the whole table with only a syslog warning. The
GNMI_ONLY rule therefore sets L4_DST_PORT from TELEMETRY|gnmi|port,
falling back to the telemetry container default 8080.

ACL_TABLE and ACL_RULE become generator-owned on-demand tables: they
are dropped from the base config up front and rebuilt only when the
device has an IPv4 OOB IP, so stale entries cannot survive a regen
and devices without an OOB IP get no ACL tables at all.

The new tables are not yet covered by the generated YANG schemas, so
the validator reports them as warnings (validation skipped), not
errors.

Closes #2330

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
Patch _add_ctrlplane_acls in the orchestrator helper fixture like the
other section helpers, so the orchestrator glue stays exercised in
isolation, and pin the wiring for #2330: the helper runs exactly once
with the OOB IP and prefix when one exists, and is not invoked when
the device has no OOB IP — in which case the owned ACL_TABLE and
ACL_RULE tables stay absent even if the base config carried stale
entries.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
@berendt berendt requested review from ideaship and osfrickler June 10, 2026 07:39
@berendt berendt marked this pull request as ready for review June 10, 2026 07:39

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="osism/tasks/conductor/sonic/config_generator.py" line_range="2289-2295" />
<code_context>
+    TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
+    follows the same lookup against the (pass-through) TELEMETRY table.
+    """
+    gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {})
+    return str(gnmi_config.get("port", DEFAULT_GNMI_PORT))
+
+
</code_context>
<issue_to_address>
**suggestion:** Consider validating that the gNMI port is numeric before using it as L4_DST_PORT.

If `TELEMETRY|gnmi|port` is set to a non-numeric value, `L4_DST_PORT` will be invalid and caclmgrd may ignore or mis-handle the rule. Please validate that the port is numeric (int or numeric string) and either fall back to `DEFAULT_GNMI_PORT` or log a warning when it isn’t, so ACL behavior remains predictable under bad TELEMETRY config.

```suggestion
def _get_gnmi_port(config):
    """Return the gNMI server port for the GNMI_ONLY control-plane ACL rule.

    The telemetry/gNMI container reads its listen port from
    TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
    follows the same lookup against the (pass-through) TELEMETRY table.

    If TELEMETRY|gnmi|port is set to a non-numeric value, fall back to
    DEFAULT_GNMI_PORT and log a warning so ACL behavior remains predictable.
    """
    gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {})
    port_value = gnmi_config.get("port")

    # No port configured: use the default.
    if port_value is None:
        return DEFAULT_GNMI_PORT

    # Accept both integer and numeric string configuration.
    try:
        port_int = int(port_value)
    except (TypeError, ValueError):
        logger.warning(
            "Invalid TELEMETRY|gnmi|port=%r in config; falling back to default gNMI port %s",
            port_value,
            DEFAULT_GNMI_PORT,
        )
        return DEFAULT_GNMI_PORT

    # Optionally guard against clearly invalid port ranges while keeping behavior predictable.
    if not 0 < port_int < 65536:
        logger.warning(
            "Out-of-range TELEMETRY|gnmi|port=%d in config; falling back to default gNMI port %s",
            port_int,
            DEFAULT_GNMI_PORT,
        )
        return DEFAULT_GNMI_PORT

    return str(port_int)
```
</issue_to_address>

### Comment 2
<location path="osism/tasks/conductor/sonic/config_generator.py" line_range="2322-2327" />
<code_context>
+    logs a warning and emits nothing rather than failing the whole config
+    generation.
+    """
+    network = ipaddress.ip_network(f"{oob_ip}/{prefix_len}", strict=False)
+    if network.version != 4:
+        logger.warning(
+            f"OOB IP {oob_ip}/{prefix_len} is not IPv4; skipping control-plane ACLs"
+        )
+        return
+
+    accept_from_oob = {
</code_context>
<issue_to_address>
**issue:** Guard against invalid OOB IPs raising ValueError during ip_network() construction.

If `oob_ip` or `prefix_len` are malformed, `ipaddress.ip_network` will raise `ValueError` and abort config generation. Consider wrapping this call in a `try/except ValueError`, logging a warning (similar to the non-IPv4 case), and returning early so bad input only skips ACL generation rather than breaking the whole config.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +2289 to +2295
def _get_gnmi_port(config):
"""Return the gNMI server port for the GNMI_ONLY control-plane ACL rule.

The telemetry/gNMI container reads its listen port from
TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
follows the same lookup against the (pass-through) TELEMETRY table.
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider validating that the gNMI port is numeric before using it as L4_DST_PORT.

If TELEMETRY|gnmi|port is set to a non-numeric value, L4_DST_PORT will be invalid and caclmgrd may ignore or mis-handle the rule. Please validate that the port is numeric (int or numeric string) and either fall back to DEFAULT_GNMI_PORT or log a warning when it isn’t, so ACL behavior remains predictable under bad TELEMETRY config.

Suggested change
def _get_gnmi_port(config):
"""Return the gNMI server port for the GNMI_ONLY control-plane ACL rule.
The telemetry/gNMI container reads its listen port from
TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
follows the same lookup against the (pass-through) TELEMETRY table.
"""
def _get_gnmi_port(config):
"""Return the gNMI server port for the GNMI_ONLY control-plane ACL rule.
The telemetry/gNMI container reads its listen port from
TELEMETRY|gnmi|port and falls back to 8080 when unset, so the ACL rule
follows the same lookup against the (pass-through) TELEMETRY table.
If TELEMETRY|gnmi|port is set to a non-numeric value, fall back to
DEFAULT_GNMI_PORT and log a warning so ACL behavior remains predictable.
"""
gnmi_config = config.get("TELEMETRY", {}).get("gnmi", {})
port_value = gnmi_config.get("port")
# No port configured: use the default.
if port_value is None:
return DEFAULT_GNMI_PORT
# Accept both integer and numeric string configuration.
try:
port_int = int(port_value)
except (TypeError, ValueError):
logger.warning(
"Invalid TELEMETRY|gnmi|port=%r in config; falling back to default gNMI port %s",
port_value,
DEFAULT_GNMI_PORT,
)
return DEFAULT_GNMI_PORT
# Optionally guard against clearly invalid port ranges while keeping behavior predictable.
if not 0 < port_int < 65536:
logger.warning(
"Out-of-range TELEMETRY|gnmi|port=%d in config; falling back to default gNMI port %s",
port_int,
DEFAULT_GNMI_PORT,
)
return DEFAULT_GNMI_PORT
return str(port_int)

Comment on lines +2322 to +2327
network = ipaddress.ip_network(f"{oob_ip}/{prefix_len}", strict=False)
if network.version != 4:
logger.warning(
f"OOB IP {oob_ip}/{prefix_len} is not IPv4; skipping control-plane ACLs"
)
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Guard against invalid OOB IPs raising ValueError during ip_network() construction.

If oob_ip or prefix_len are malformed, ipaddress.ip_network will raise ValueError and abort config generation. Consider wrapping this call in a try/except ValueError, logging a warning (similar to the non-IPv4 case), and returning early so bad input only skips ACL generation rather than breaking the whole config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

Ensure that the SONiC SNMP/GNMI services are accessible only on the management network

2 participants