Skip to content

Add unit tests for SONiC sync and exporter modules and fix issues found in review#2333

Merged
ideaship merged 8 commits into
mainfrom
implement/issue-2224-sonic-sync-exporter-tests
Jun 10, 2026
Merged

Add unit tests for SONiC sync and exporter modules and fix issues found in review#2333
ideaship merged 8 commits into
mainfrom
implement/issue-2224-sonic-sync-exporter-tests

Conversation

@berendt

@berendt berendt commented Jun 9, 2026

Copy link
Copy Markdown
Member

Closes #2224. Part of the Tier 3 SONiC unit-test effort (#2199), follow-up
to #2192 / #2193.

Adds two test modules covering the SONiC sync pipeline. Both mirror the
existing tests/unit/tasks/conductor/sonic/ conventions: the shared
mock_nb / loguru_logs fixtures, SimpleNamespace device stubs, and a
per-module fixture that stubs collaborators at their import site (the
patch_orchestrator_helpers pattern).

The review surfaced seven real defects in exporter.py / sync.py that the
initial tests would otherwise have pinned as intended behavior. Since the
requested regression tests only pass against fixed sources, the fixes are
included here as separate commits (see below), so the PR is no longer
tests-only.

Change set (commit order)

  1. Add unit tests for SONiC exporter moduletest_exporter.py

    • save_config_to_netbox: first-time create, no-change (DeepDiff empty,
      no save), changed config (unified diff + journal entry + save), journal
      failure still saves. Both the tuple (return_diff=True) and bool return
      forms are asserted.
    • export_config_to_file: filename selection (hostname vs. serial-number,
      empty/missing serial falls back to hostname with a warning), diff
      handling (missing / identical / changed / unreadable file), symlink
      handling (absent target, existing regular file, dangling symlink,
      os.symlink failure, hostname mode / falsy serial skip), and the outer
      os.makedirs failure path.
    • File-export tests use a real tmp_path filesystem where that reads more
      clearly (the symlink end-state proves remove-then-symlink ordering), and
      patch os.* only to inject the symlink / makedirs failures.
  2. Add unit tests for SONiC sync orchestratortest_sync.py

    • Cache lifecycle: exact start/end clear ordering via attach_mock, and
      the metalbox / VIP loads called once per invocation.
    • Single-device path: allowed role processed; disallowed role, role=None,
      not-found, and lookup-raises each return {}; single spine triggers the
      full spine fetch (nb.dcim.devices.filter), single leaf does not.
    • Multi-device path: only allowed roles kept; role-less devices skipped.
    • Per-device: missing / unsupported / valid HWSKU; config_version read
      from custom fields.
    • AS mapping: computed per spine group, propagated into
      generate_sonic_config, and a min_as=None group is not propagated.
    • Diff handling: diff streamed to task output, first-time message,
      no-change still exports, show_diff=False calls save without
      return_diff.
    • Task output gated on task_id; cache-stats logged only when non-empty.

    generate_sonic_config and find_interconnected_devices are stubbed (they
    have their own issues under Meta: Unit test coverage for osism/ #2199), per the issue's instructions.

Review feedback fixes (each with regression tests)

  1. Own only the sonic_config key in NetBox local context (review finding 1)
    save_config_to_netbox no longer replaces the device's entire
    local_context_data dict; sibling keys such as frr_parameters /
    netplan_parameters survive, and diffing covers only the owned
    sonic_config key (mirrors the sonic: enforce full config_db.json ownership #2297 partitioned-ownership model).

  2. Create SONiC journal entry only after successful save (finding 7)
    — a failed device.save() can no longer leave an orphaned journal entry
    claiming the update succeeded.

  3. Write SONiC config exports atomically (finding 3)
    — temp file + os.replace() so a mid-write failure (ENOSPC, killed
    process) cannot truncate the previous export.

  4. Reconcile hostname symlink independently of config changes
    (findings 2 + 4) — a missing/stale link is repaired even when the config
    content is unchanged, the reconciliation is idempotent, and a
    hostname == serial device no longer destroys its own config via a
    self-referential symlink.

  5. Always clean up caches and finish task output in sync_sonic (finding 5)
    try/finally around the sync body; early returns and per-device
    failures no longer leak the module-level caches into the next run, finish
    the task with rc=1, and a failing device no longer aborts the remaining
    devices. The summary log no longer raises on PORT-less configs.

  6. Raise on SONiC persistence failures instead of reporting no change
    (finding 6) — failed NetBox saves and file exports raise instead of
    returning the same value as "no changes"; sync_sonic reports them via
    rc=1 instead of finishing failed syncs with rc=0.

Local verification

  • black --check — pass
  • flake8 — pass
  • pytest — not run locally; the unit-test suite runs in the
    python-osism-unit-tests Zuul job. The coverage gate
    (--cov=...sync/...exporter >= 90 %) is verified there.

@berendt berendt force-pushed the implement/issue-2224-sonic-sync-exporter-tests branch from 07abc30 to abf328d Compare June 9, 2026 11:09
@berendt berendt marked this pull request as ready for review June 9, 2026 11:15
@berendt berendt requested a review from ideaship June 9, 2026 11:15

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Both test_sync.py and test_exporter.py define identical helpers like _has_log; consider moving these small utilities into a shared test helper/fixture module under tests/unit/tasks/conductor/sonic/ to avoid duplication and keep future changes in one place.
  • Several tests assert on exact log message substrings and precise call sequences (e.g. full cache lifecycle order); if the behavior doesn’t require strict ordering or exact wording, consider loosening these expectations slightly to reduce brittleness against harmless refactors while still checking the important semantics.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Both `test_sync.py` and `test_exporter.py` define identical helpers like `_has_log`; consider moving these small utilities into a shared test helper/fixture module under `tests/unit/tasks/conductor/sonic/` to avoid duplication and keep future changes in one place.
- Several tests assert on exact log message substrings and precise call sequences (e.g. full cache lifecycle order); if the behavior doesn’t require strict ordering or exact wording, consider loosening these expectations slightly to reduce brittleness against harmless refactors while still checking the important semantics.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ideaship ideaship moved this from Ready to In review in Human Board Jun 9, 2026

@ideaship ideaship left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major, just some minor issues.

assert result == (False, None)


def test_save_config_change_writes_journal_and_saves(mock_nb):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 1 — local_context_data wholesale-replace clobbers sibling keys (data loss)

These changed-context save tests only build local_context_data={"sonic_config": ...}, which hides that save_config_to_netbox replaces the device's entire dict (exporter.py:88/:94) and diffs the full existing context against {"sonic_config": ...} (exporter.py:42). Please add a case with a sibling key, e.g. local_context_data={"frr_parameters": {...}, "sonic_config": old}, and assert the sibling survives.

It won't today — verified by running the real exporter against a leaf-role device:

BEFORE: ['frr_parameters', 'sonic_config']
AFTER : ['sonic_config']

The exporter deletes frr_parameters, reports it as a removal in its diff log, and writes a journal entry attributing the deletion to the SONiC update. frr_parameters is not hypothetical — the inventory-reconciler reads it (and netplan_parameters) out of local_context_data via config_context.

Tests-only PR, so the source fix is a separate change: mirror the #2297 partitioned-ownership model — own and replace only the sonic_config key (device.local_context_data = {**(device.local_context_data or {}), "sonic_config": config}) and diff only that key. This test should pin sibling-key survival once that lands.

assert _has_log(loguru_logs, "ERROR", "Failed to save diff to journal")


def test_save_config_save_failure_returns_false(mock_nb, loguru_logs):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 6 — failed writes reported to the task layer as rc=0

assert result == (False, None) pins an ambiguity: save_config_to_netbox returns the same (False, None) for a failed save (exporter.py:103) as for no change (:48), and export_config_to_file does the same (:226 vs :222). In sync.py (not in this diff), :219 then treats a failed write as "no changes", logs accordingly, and still finishes rc=0 (:244) — failed syncs report success. Source fix: failures should raise or return a distinct sentinel so task status reflects them; this test should assert the distinction once that lands.

assert _has_log(loguru_logs, "ERROR", "Failed to save local context")


def test_save_config_save_failure_bool_form(mock_nb):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 7 — journal entry created before device.save()

Both save-failure tests pass local_context_data=None, hitting only the first-time branch. Please add a changed-path save failure (existing sonic_config + raising device.save()). In exporter.py (not in this diff) the journal entry is created at :71 before save() at :89, so a failed save returns (False, None) while an orphaned journal entry already claims the update succeeded (and the in-memory device stays mutated). Source fix: move journal creation after a successful save.

# --- diff handling ----------------------------------------------------------


def test_export_no_change_returns_false(tmp_path, export_settings, patch_hostname):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 4 — symlink repair gated on config_changed

Symlink reconciliation should be tested independently of a content change. In exporter.py (not in this diff) the whole symlink block is gated by if config_changed: (:177), so an unchanged serial-mode config with a missing/stale hostname link is never repaired — and after a symlink failure, a later unchanged run never retries. Please add a case: existing identical config + missing hostname link, and assert the link gets created. It currently won't.

# --- symlink handling -------------------------------------------------------


def test_export_creates_symlink_when_hostname_absent(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 2 — hostname == serial → self-referential symlink (low priority)

Low-priority defensive case. Every symlink test uses distinct hostname (sw-1) and serial (ABC123); add one where they're equal. In exporter.py (not in this diff), when serial == hostname the just-written config file is the hostname path, so :207 os.removes it and :209 symlinks it to itself — config destroyed, yet the function returns True. One-line source fix: skip the symlink when filepath == hostname_filepath.

Unlikely in practice — needs the serial set equal to the device name/inventory_hostname, which doesn't happen with real manufacturer serials or on the seeded testbed (serials unset → safe fallback). But the guard is trivial against silent data loss that reports success.

# --- error handling ---------------------------------------------------------


def test_export_makedirs_failure_returns_false(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 3 — non-atomic write truncates the previous export (cheap hygiene)

Cheap-hygiene note, not a blocker. Error coverage only injects a makedirs failure; a write/serialize-failure case (patch json.dump to raise, assert the previous export is still intact) would be a nice addition. In exporter.py (not in this diff) the file is opened "w" at :179, truncating before serialization, so a mid-write crash (ENOSPC / killed process) leaves the prior config truncated and returns False. Source fix: temp file + os.replace().

Low priority here — the export self-heals on the next sync and the in-repo reader rejects bad JSON — but it would rank higher if we expected attackers or external consumers of the export dir.

assert result == {"sw-1": {"PORT": {"Ethernet0": {}}}}


def test_single_device_disallowed_role_returns_empty(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding 5 — cleanup / finish_task_output skipped on early returns and mid-loop exceptions (would block)

These early-return tests check the return value and log message but not that caches were cleared or finish_task_output was called. In sync.py (not in this diff), the early returns (:77/:80/:83) and any exception in the per-device loop — e.g. the unguarded len(sonic_config['PORT']) at :225 on a PORT-less config — return after caches load (:42–52) but before cleanup (:237–239) and finish_task_output (:244). Caches are module-level, so the leak corrupts the next run. Please assert cleanup + finalization on these paths and add a mid-loop raising-collaborator case; the source fix is a try/finally.

@berendt berendt changed the title Add unit tests for SONiC sync and exporter modules Add unit tests for SONiC sync and exporter modules and fix issues found in review Jun 10, 2026
berendt added 8 commits June 10, 2026 08:05
Cover save_config_to_netbox (NetBox local-context persistence with diff
checking and journal logging) and export_config_to_file (on-disk export,
filename selection, and the serial-number to hostname symlink).

The file-export tests drive a real filesystem under tmp_path where that
reads more clearly than asserting on mocked os calls, and patch os.* only
to inject the symlink and makedirs failure paths.

Part of #2199.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
Cover the sync_sonic entry point: cache lifecycle ordering, single-device
and NetBox-filtered device resolution, role/HWSKU filtering, spine and
superspine AS-mapping propagation, diff streaming to task output, and
cache-stats logging.

Every heavy collaborator (generate_sonic_config, find_interconnected_devices,
NetBox access) is stubbed at its sync import site so only the orchestration
glue is exercised; those modules are covered separately under #2199.

Part of #2199.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
save_config_to_netbox previously replaced the device's entire
local_context_data dict and diffed the full existing context against
{"sonic_config": ...}. Sibling keys such as frr_parameters or
netplan_parameters (consumed by the inventory reconciler via
config_context) were silently deleted on every SONiC config update,
with the removal attributed to SONiC in the journal diff.

Mirror the partitioned-ownership model from #2297: merge the new
sonic_config key into the existing context instead of replacing it,
and diff only the owned key so unrelated sibling changes neither
trigger nor pollute SONiC updates.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
The journal entry documenting a configuration update was created
before device.save(). When the save subsequently failed, the function
reported no change while an orphaned journal entry already claimed the
update had happened. Move journal creation after the successful save.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
export_config_to_file opened the target with "w", truncating the
previous export before serialization. A mid-write failure (ENOSPC,
killed process) left the prior config truncated on disk. Write to a
temporary file and os.replace() it into place so the previous export
survives a failed write.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
The whole symlink block was gated on config_changed, so an unchanged
config with a missing or stale hostname link was never repaired — and
after a symlink failure, later unchanged runs never retried. Move the
reconciliation out of the gate and make it idempotent (a link already
pointing at the config file is left untouched).

Also guard against hostname == serial: the just-written config file is
the hostname path in that case, and the old code removed it and
symlinked it to itself, destroying the config while reporting success.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
The single-device early returns and any exception in the per-device
loop returned after the module-level caches were loaded but before the
cleanup and finish_task_output calls, leaking cache state into the
next run and leaving the task unfinished. Wrap the sync body in
try/finally so cleanup and task finalization run on every exit path.

Early returns and per-device failures now finish the task with rc=1
instead of silently reporting success; a failing device no longer
aborts the sync of the remaining devices. The per-device summary log
no longer raises on configs without a PORT section.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
save_config_to_netbox returned the same (False, None) for a failed
save as for "no changes", and export_config_to_file returned the same
False for a failed write as for an unchanged file. sync_sonic then
treated failed writes as "no changes" and finished the task with
rc=0, so failed syncs reported success.

Log and re-raise in both exporter functions; the per-device handler
in sync_sonic catches the failure, continues with the remaining
devices, and finishes the task with rc=1.

AI-assisted: Claude Code
Signed-off-by: Christian Berendt <berendt@osism.tech>
@berendt berendt force-pushed the implement/issue-2224-sonic-sync-exporter-tests branch from 89ba47d to cbf7708 Compare June 10, 2026 06:05
@berendt berendt requested a review from ideaship June 10, 2026 06:37
@ideaship ideaship merged commit 10b8a34 into main Jun 10, 2026
3 checks passed
@ideaship ideaship deleted the implement/issue-2224-sonic-sync-exporter-tests branch June 10, 2026 06:46
@github-project-automation github-project-automation Bot moved this from In review to Done in Human Board Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Unit tests for osism/tasks/conductor/sonic/{sync,exporter}.py

3 participants