Skip to content

Unit tests for osism/commands/ — hardware (baremetal, redfish, server, stress) #2364

@berendt

Description

@berendt

Background

Follow-up to #2192 (foundation) and PR #2193 (pytest + Zuul infrastructure). Part of Tier 8 (#2199). This issue covers the hardware-related CLI command modules: osism/commands/baremetal.py (1729 LOC, 14 cliff commands around Ironic node lifecycle), osism/commands/redfish.py (224 LOC, Redfish resource listing via a Celery task), osism/commands/server.py (454 LOC, Nova server migrate/list/clean), and osism/commands/stress.py (242 LOC, wrapper around the openstack-simple-stress tool). Together ~2650 LOC — the largest CLI group in the meta issue.

Scope

Extend tests/unit/commands/test_baremetal.py and tests/unit/commands/test_server.py; create tests/unit/commands/test_redfish.py and tests/unit/commands/test_stress.py.

Already covered — do not duplicate:

  • tests/unit/commands/test_baremetal.py: node-not-found → exit 1 for all ten node commands (Deploy, Undeploy, BurnIn, Clean, Provide, MaintenanceSet/Unset, PowerOn/Off, Delete); BaremetalDump failure paths (--ironic node not found, NetBox unavailable, device not found); BaremetalPing failure paths (NetBox unavailable, device not found, generic exception); missing-name argument validation for eight commands; burn-in with all stressors disabled; --all without --yes-i-really-really-mean-it for Deploy(--rebuild)/Undeploy/Clean/Delete.
  • tests/unit/commands/test_server.py: ServerList lookup failures (user-domain, user, domain, project-domain, project not found → exit 1); ServerMigrate refuses non-ACTIVE/PAUSED server.
  • No tests exist for redfish.py or stress.py.

The remaining gaps are almost exclusively happy paths and state-machine branches. Given the size of baremetal.py, prioritize the pure-logic targets (_apply_metalbox_vars, BaremetalPing._ping_host, clean-step construction, provision-state decision matrices) first; the simple wrapper commands (Provide/Maintenance/Power) are one parametrized block. This issue may be split further during implementation (e.g. baremetal.py into its own sub-issue, analogous to the Tier 3 splits).

Test targets

osism/commands/baremetal.py — gaps only

_apply_metalbox_vars()baremetal.py:19

Patch osism.commands.baremetal._get_metalbox_primary_ip4 (imported at module top).

  • IP returned ("192.168.30.1") → play_vars["hosts_additional_entries"] == {"metalbox.osism.xyz": "192.168.30.1"} and play_vars["docker_insecure_registries"] == ["metalbox:5001"]
  • Returns None/falsy → play_vars left untouched

BaremetalList.take_action()baremetal.py:56

Patch osism.tasks.openstack.get_cloud_helpers (pattern from existing _run_not_found); osism.utils.nb via patch.dict("osism.utils.__dict__", ...); osism.tasks.openstack.get_baremetal_node_netbox_info for --netbox.

  • setup_cloud_environment returns success=False → returns 1, get_openstack_connection never called (this guard is identical in every command of the module — one parametrized test over the command classes is enough)
  • Two nodes returned out of order → table rows sorted by name (capsys); power_state=None rendered as "n/a"
  • --provision-state active --maintenanceconn.baremetal.nodes called with provision_state="active", maintenance=True; no flags → called with no query kwargs
  • --netbox with utils.nb set → get_baremetal_node_netbox_info called per row, "Device Role" header inserted at index 1, device_role=None"N/A"
  • --netbox with utils.nb = None → no extra column
  • cleanup_cloud_environment called even when conn.baremetal.nodes raises (finally block)

BaremetalDeploy.take_action()baremetal.py:157

Patch: osism.tasks.openstack.get_cloud_helpers, osism.tasks.conductor.utils.get_vault / osism.tasks.conductor.utils.deep_decrypt (imported inside take_action — patch at source), openstack.baremetal.configdrive.pack (imported inside the loop — patch at source to avoid building a real config drive), osism.utils.nb, osism.commands.baremetal._get_metalbox_primary_ip4.

  • Provision-state decision matrix: available / deploy failed (not maintenance) → target "active"; error"rebuild"; active + --rebuild"rebuild"; active without --rebuild → skipped with warning; maintenance=True → skipped
  • Empty instance_info with extra["instance_info"] JSON string → conn.baremetal.update_node called with the parsed dict (Ironic drops instance_info on undeploy)
  • validate_node raises openstack.exceptions.ValidationException → node skipped, set_node_provision_state not called
  • NetBox: device found by name; found via cf_inventory_hostname filter when name lookup returns None; local_context_data becomes the play vars with frr_parameters/netplan_parameters popped; NetBox lookup raising → warning, deploy continues with empty defaults
  • node.extra["netplan_parameters"] (JSON string) → vars merged, network_allow_service_restart=True, role osism.commons.network appended
  • node.extra["frr_parameters"] (JSON string) → deep_decrypt(frr_params, vault) called, frr_dummy_interface set from settings.FRR_DUMMY_INTERFACE, role osism.services.frr appended
  • Vendor " Supermicro " (whitespace/case-normalized) → set_node_boot_device(node.id, "cdrom", persistent=False); the call raising → warning only, deploy proceeds
  • target_raid_config set → deploy_steps with erase_devices_metadata (priority 95) and raid apply_configuration (priority 90, delete_existing=True) passed to set_node_provision_state; unset → deploy_steps=None
  • configdrive.pack raising → node skipped with warning, no provision call
  • set_node_provision_state raising → warning, loop continues with next node
  • --allconn.baremetal.nodes(details=True) used instead of find_node

BaremetalDump.take_action()baremetal.py:428 (gaps)

Patch as for Deploy (vault/decrypt, utils.nb, _get_metalbox_primary_ip4).

  • --ironic happy path: playbook YAML printed to stdout — parse with yaml.safe_load (capsys) and assert hostname_name == node.name, base roles osism.commons.hostname/hosts/operator, rsyslog restart task
  • --ironic with node.extra netplan/frr JSON strings → roles and vars extended as in Deploy
  • --ironic setup failure → returns 1
  • NetBox default path happy: device by name and via cf_inventory_hostname; local_context_data defaults with frr_parameters/netplan_parameters popped; custom_fields["netplan_parameters"] (already a dict, no json.loads) merged + network role; custom_fields["frr_parameters"]deep_decrypt + frr role
  • Exception during playbook generation → error logged, no raise

BaremetalUndeploy.take_action()baremetal.py:709 (gaps)

Patch osism.commands.baremetal.cleanup_ssh_known_hosts_for_node (module-top import) plus the cloud helpers.

  • Provision state in ["active", "wait call-back", "deploy failed", "error"] (parametrize) → set_node_provision_state(node.id, "undeploy") and cleanup_ssh_known_hosts_for_node(node.name) called; cleanup returns True → info log, False → warning
  • Unsupported state (e.g. available) → warning, no undeploy call
  • set_node_provision_state raising → warning, continue
  • --all --yes-i-really-really-mean-it → iterates conn.baremetal.nodes()

BaremetalPing._ping_host()baremetal.py:811

Patch osism.commands.baremetal.subprocess.run. Pure output-parsing logic — highest-value target of the module.

  • returncode=0, stdout containing "0% packet loss" and a round-trip min/avg/max = 1.0/2.0/3.0 ms line → status == "SUCCESS", time_info is the part after "="
  • returncode=0 with "3 packets transmitted, 2 packets received, 33% packet loss" → status starts with "PARTIAL (" (third comma field)
  • returncode=0, no packet-loss line → SUCCESS, time_info == "N/A"
  • Linux rtt min/avg/max/mdev = ... line variant also parsed
  • returncode != 0FAILED / "N/A"
  • subprocess.TimeoutExpired side effect → ERROR, time_info truncated to 50 chars
  • Result stored under results[host_name] with keys host/status/time_info

BaremetalPing.take_action()baremetal.py:860 (gaps)

Patch osism.utils.nb, osism.tasks.conductor.netbox.get_nb_device_query_list_ironic, osism.tasks.netbox.get_devices (both imported inside take_action — patch at source), and BaremetalPing._ping_host to avoid real pings/threads.

  • No name → devices collected from all query-list entries, filtered to custom_fields power_state == "power on" and provision_state == "active"; non-matching devices dropped
  • No devices after filtering → info log, returns None (not 1)
  • Device without primary_ip4 → warning, excluded; none left → info log, return
  • primary_ip4.address == "10.0.0.1/24" → ping candidate IP "10.0.0.1"
  • Happy path: results table plus "Summary: X successful, Y failed/partial out of N total" (capsys); PARTIAL counts as failed

BaremetalBurnIn.take_action()baremetal.py:1012 (gaps)

  • Default flags → clean_steps == [{"step": "burnin_cpu", ...}, {"step": "burnin_memory", ...}, {"step": "burnin_disk", "interface": "deploy"}]; --no-disk removes only burnin_disk
  • available node → set_node_provision_state(node.id, "manage") then wait_for_nodes_provision_state([node.id], "manageable"); manage step raising → warning, node skipped
  • manageable node: instance_info refresh from extra (as in Deploy), Supermicro cdrom boot device, then set_node_provision_state(node.id, "clean", clean_steps=...); the clean call raising → warning, continue
  • active node without --yes-i-really-really-mean-it → error logged, no service call
  • active node with confirmation → node.set_provision_state(conn.baremetal, "service", service_steps=...) with burnin_disk filtered out and a data-loss warning logged
  • Unsupported state (e.g. enroll) → warning

BaremetalClean.take_action()baremetal.py:1177 (gaps)

  • Node with raid_interface != "no-raid"{"interface": "raid", "step": "delete_configuration"} prepended to erase_devices; "no-raid" → only erase_devices
  • available → manage → manageable flow incl. failure (as BurnIn)
  • manageableset_node_provision_state(node.id, "clean", clean_steps=...) + success info; raising → warning, continue
  • Unsupported state → warning

BaremetalProvide / BaremetalMaintenanceSet / BaremetalMaintenanceUnset / BaremetalPowerOn / BaremetalPowerOffbaremetal.py:1310, :1388, :1437, :1485, :1543 (happy paths)

One small parametrized block; the not-found paths are already covered.

  • Provide: manageable + not maintenance → set_node_provision_state(node.id, "provide"); maintenance=True → warning, no call; the call raising → warning
  • MaintenanceSet with --reason fooset_node_maintenance(node, reason="foo"); raising → error logged, no raise
  • MaintenanceUnset → unset_node_maintenance(node); raising → error logged
  • PowerOn → set_node_power_state(node.id, "power on"); PowerOff default → "power off", --soft"soft power off"; raising → error logged

BaremetalDelete.take_action()baremetal.py:1611 (gaps)

Patch osism.utils.nb and osism.utils.secondary_nb_list via patch.dict("osism.utils.__dict__", ...).

  • Ports from conn.baremetal.ports(node_uuid=node.id) deleted before the node (delete_port(port.id, ignore_missing=True)); a single port deletion raising → warning, remaining ports and node still deleted
  • delete_node(node.id, ignore_missing=True) called
  • Primary NetBox: device found → custom_fields updated with {"provision_state": None, "power_state": None} and device.save() called; device not found → warning; lookup raising → warning, processing continues
  • secondary_nb_list with one entry → same clearing on the secondary; failure → warning only
  • utils.nb = None → no NetBox interaction, no crash
  • delete_node raising → error logged, loop continues with next node (--all path)

osism/commands/redfish.py — no existing tests

List._normalize_column_name()redfish.py:12

  • "MAC Address""mac_address"; already-normalized input unchanged; empty string / None returned as-is

List._get_column_mappings()redfish.py:18

  • "EthernetInterfaces" / "NetworkAdapters" / "NetworkDeviceFunctions" return the expected display-name → key dicts (spot-check a few entries)
  • Unknown resource type → None

List._get_filtered_columns()redfish.py:57

  • selected_columns=None/empty → all headers and data keys returned
  • Selection matches case-insensitively with spaces (["mac address", "ID"] → headers ["ID", "MAC Address"] in mapping order)
  • Unknown requested column → warning logged listing available columns, valid ones still returned
  • Nothing matches → ([], [])

List._filter_json_data()redfish.py:86

  • Items reduced to the given keys; missing key → None value
  • Empty data or empty data_keys → input returned unchanged

List._filter_and_display_table()redfish.py:98

  • Empty data → returns without printing
  • All requested columns invalid → prints "No valid columns specified"
  • Happy path → grid table plus "Total items: N" (capsys)

List.take_action()redfish.py:147

Patch osism.utils.check_task_lock_and_exit and osism.tasks.conductor.get_redfish_resources (imported inside take_action). The Celery task is only used via .delay(...).get() — patch the task attribute with a MagicMock whose delay.return_value.get.return_value is fixture data; no broker needed.

  • check_task_lock_and_exit called before anything else
  • --format json, result present, no columns → full json.dumps(result, indent=2) (capsys)
  • --format json with --column and known resource type → only the selected keys in output
  • --format json with --column but unknown resource type (no mappings) → full dump
  • --format json, empty result → prints "[]"
  • Table format dispatch: each of the three known resource types reaches its _display_* helper (assert via output)
  • Unknown resource type with result → info log "Retrieved resources: ..."
  • No result, table format → "No {resourcetype} resources found for {hostname}"

osism/commands/server.py — gaps only

ServerMigrate.take_action()server.py:54 (gaps)

Patch osism.tasks.openstack.get_cloud_helpers, osism.commands.server.prompt, osism.commands.server.time.sleep.

  • ACTIVE server + --yeslive_migrate_server(id, host=None, block_migration="auto", force=False)
  • --target host1 --force → passed through as host="host1", force=True
  • Prompt answers: "no" → no migration call; "y" → accepted
  • Wait loop: get_server side effect [MIGRATING, MIGRATING, ACTIVE] → polls until non-MIGRATING (patch time.sleep); --no-waitget_server called exactly once (initial lookup)
  • PAUSED status also allowed

ServerList.take_action()server.py:151 (gaps)

  • --domain happy path: projects of the domain iterated, conn.compute.servers(all_projects=True, project_id=...) per project, 7-column table printed (capsys)
  • --project --project-domain: find_project called with domain_id query; domain name resolved via get_domain; get_domain raising → falls back to the raw domain_id
  • --user happy path: servers queried with user_id=...; per-server domain resolved via project; get_project/get_domain raising → domain_name stays None
  • Default branch (no filters): only servers in build/error older than 7200 s appear in the table (craft created_at timestamps on both sides of the threshold); fresh servers excluded

ServerClean.take_action()server.py:403

Patch the cloud helpers and osism.commands.server.prompt.

  • BUILD server older than --build-timeout + --yesdelete_server(server.id, force=True)
  • BUILD server younger than the timeout → not deleted, no prompt
  • Custom --build-timeout 60 honored
  • Prompt answer "no" → no deletion
  • ERROR-status server prompts regardless of age; --yes deletes
  • cleanup_cloud_environment called in finally

osism/commands/stress.py — no existing tests

OpenStackStress.take_action()stress.py:167

Patch osism.tasks.openstack.get_cloud_helpers and osism.commands.stress.subprocess.run.

  • Defaults → command starts with ["python3", "/openstack-simple-stress/openstack_simple_stress/main.py"], contains all integer/string defaults (--interval 10, --number 1, --cloud simple-stress, --flavor SCS-1V-2, --affinity soft-anti-affinity, --mode rolling, ...), and none of the boolean flags
  • Boolean flags (--no-cleanup --debug --no-delete --no-volume --no-boot-volume --no-wait --clean) each appended when set
  • Custom values propagated (--number 5 --flavor X --volume-size 10)
  • subprocess.run returncode passed through as exit code (test 0 and a non-zero value)
  • FileNotFoundError → returns 1, error log mentions the tool path
  • Generic exception → returns 1
  • setup_cloud_environment failure → returns 1, subprocess.run never called
  • cleanup_cloud_environment called in all paths, including exceptions

Mocking hints

  • Reuse the established pattern from test_baremetal.py: get_cloud_helpers is imported inside every take_action, so patch osism.tasks.openstack.get_cloud_helpers returning a triple (setup, get_conn, cleanup) with setup = MagicMock(return_value=("pw", [], None, True)).
  • Ironic node objects mix attribute access (node.provision_state, node.extra), item access (node["maintenance"], node["target_raid_config"]), membership tests ("instance_info" in node) and node.get(...). A plain MagicMock does not support in/[] well — a small dict-backed fake class (attributes + __getitem__/__contains__/get) shared via a fixture pays off across all baremetal tests.
  • osism.utils.nb and osism.utils.secondary_nb_list are lazily initialized module attributes — patch via patch.dict("osism.utils.__dict__", {"nb": fake_nb, "secondary_nb_list": [fake_secondary]}) as the existing tests already do.
  • BaremetalDeploy/BaremetalDump (--ironic): patch openstack.baremetal.configdrive.pack (local import inside the function) and osism.tasks.conductor.utils.get_vault/deep_decrypt at their source module. Use the real openstack.exceptions.ValidationException as side_effect for the validation test.
  • redfish.List.take_action: get_redfish_resources is a Celery task, but the CLI only calls .delay(...).get(). Patching osism.tasks.conductor.get_redfish_resources with a MagicMock avoids any broker (consistent with the Tier-wide rule of never needing Celery infrastructure in unit tests).
  • Output assertions (tabulate tables, JSON, YAML playbooks, summaries) via pytest's capsys; for BaremetalDump, yaml.safe_load(capsys.readouterr().out) gives structural assertions instead of string matching.
  • BaremetalPing.take_action: patch BaremetalPing._ping_host directly when testing device filtering so no threads do real work; test the parsing logic separately via subprocess.run mocks.
  • ServerMigrate/ServerClean: prompt (prompt_toolkit) and time are module-top imports — patch osism.commands.server.prompt and osism.commands.server.time.sleep.

Definition of Done

  • tests/unit/commands/test_redfish.py and tests/unit/commands/test_stress.py created
  • tests/unit/commands/test_baremetal.py and tests/unit/commands/test_server.py extended (no duplication of existing cases)
  • All listed cases covered
  • Coverage: osism.commands.baremetal ≥ 80 %, osism.commands.server ≥ 90 %, osism.commands.redfish ≥ 90 %, osism.commands.stress ≥ 95 %
  • pipenv run pytest tests/unit/commands/test_baremetal.py tests/unit/commands/test_redfish.py tests/unit/commands/test_server.py tests/unit/commands/test_stress.py passes locally
  • flake8, mypy, python-black remain green
  • Zuul job python-osism-unit-tests passes

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions