Skip to content

Unit tests for osism/commands/ — workflow (apply, check, validate, wait, compose, sync, get, log, console) #2363

@berendt

Description

@berendt

Background

Follow-up to #2192 (foundation) and PR #2193 (pytest + Zuul infrastructure). Part of Tier 8 (#2199). This issue covers the workflow-oriented CLI command modules under osism/commands/: apply.py (501 LOC), check.py (778 LOC), validate.py (217 LOC), wait.py (159 LOC), compose.py (45 LOC), sync.py (406 LOC), get.py (302 LOC), log.py (237 LOC) and console.py (296 LOC) — together ~2,940 LOC. They are cliff Command classes that schedule Celery tasks (apply, validate, sync), inspect Celery state (wait, get tasks), shell out via SSH/clush/docker (compose, log, console) or compare filesystem metadata (check).

Scope

Add tests/unit/commands/test_apply.py, test_check.py, test_compose.py, test_sync.py, test_log.py and test_console.py; extend the existing test_validate.py, test_wait.py and test_get.py.

Already covered (do not duplicate):

  • test_validate.py: Run._handle_task returns 1 on TimeoutError while waiting.
  • test_wait.py: exit-code contract of the --live path (timeout with one/two tasks, task-rc passthrough, success → 0).
  • test_get.py: Hosts.take_action (inventory load failure → 1, empty inventory → success) and Hostvars.take_action (inventory query failure → 1, missing variable → success).

This is a large group: prioritize the pure-logic / high-value functions listed below; interactive loops (log.Opensearch, console container_prompt) are lowest priority. Like the Tier 3 issues, this issue may be split further during implementation (suggested cut: apply+check / sync+get / compose+log+console + the validate/wait extensions).

Test targets

Run._prepare_task()apply.py:269

Patch osism.tasks.ansible.run, osism.tasks.ceph.run, osism.tasks.kolla.run, osism.tasks.kubernetes.run (the .si attribute is what gets called; imports happen inside the method body, so patch the canonical task module paths). MAP_ROLE2ENVIRONMENT / MAP_ROLE2RUNTIME are lazy module attributes of osism.data.playbooks — see mocking hints.

  • role="ceph" → environment forced to "ceph", ceph.run.si("ceph", "ceph", arguments, auto_release_time=task_timeout)
  • role="ceph-osds", environment "ceph"ceph- prefix stripped: ceph.run.si("ceph", "osds", ...)
  • sub="zone-a" with ceph/kubernetes/kolla environments → environment becomes "ceph.zone-a" etc.
  • environment "kubernetes"kubernetes.run.si(...)
  • role="loadbalancer-ng" → returns chain of kolla.run.si(env, "loadbalancer-ng", ...) piped into a group over enums.LOADBALANCER_PLAYBOOKS
  • environment "kolla", role="kolla-keystone"kolla- prefix stripped; kolla.run.si called with ["-e kolla_action=deploy"] + arguments
  • role="mariadb-ng" / "rabbitmq-ng" → argument is -e kolla_action_ng=<action> instead
  • kolla role listed in MAP_ROLE2RUNTIME["osism-ansible"] (and not "common") → routed to ansible.run.si with the original arguments (no kolla_action)
  • environment None and role not in MAP_ROLE2ENVIRONMENT → bare-except fallback to "custom", info log "Trying to run play ...", ansible.run.si("custom", ...)
  • overwrite="other" in the default branch → environment replaced by "other"

Run._handle_collection() / Run.handle_collection()apply.py:135 / apply.py:218

Patch the task modules as above plus osism.tasks.ansible.noop; patch celery.chain / celery.group or assert on the returned signature structure. Drive via small hand-built Role trees (osism.data.enums.Role) instead of the real MAP_ROLE2ROLE.

  • item that is not a RoleTypeError raised and error logged
  • flat list of Roles without dependencies → one prepared task per role, wrapped in a group
  • Role with nested dependencies → chain(parent_task, child_group) built recursively
  • dry_run=Trueansible.noop.si() used instead of _prepare_task
  • show_tree=True → no tasks created, returns None, tree logged ("A [0] ..." / indentation grows with counter)
  • handle_collection: apply_async() called on the result when not show_tree; not called for show_tree; distinct log messages for dry-run / show-tree / normal mode (patch osism.data.enums.MAP_ROLE2ROLE with dict-patch for the collection lookup)

Run.handle_role() / Run.handle_loadbalancer_task()apply.py:371 / apply.py:106

Patch osism.tasks.handle_task (imported inside the method) and stub _prepare_task to return a MagicMock whose apply_async() yields either a plain result or a celery.result.GroupResult instance.

  • plain task → handle_task(task, wait, format, timeout) rc passed through
  • GroupResulthandle_loadbalancer_task path taken
  • handle_loadbalancer_task with wait=True: rc comes from handle_task(t.parent, ...), t.get() called
  • wait=False: t.parent.get() additionally called (garbage-collector workaround), children logged for format="log"

Run.take_action()apply.py:417

Patch osism.commands.apply.utils.check_task_lock_and_exit, osism.commands.apply.utils.check_ansible_facts, and the handle_role / handle_collection methods on the command instance.

  • check_task_lock_and_exit always invoked first
  • no role → table of MAP_ROLE2ENVIRONMENT printed (capsys), returns 0
  • ansible-facts freshness check: performed when a role is given and >300 s since utils._last_ansible_facts_check; skipped for roles "gather-facts"/"facts", for --show-tree, and when the last check is recent (reset the _last_ansible_facts_check attribute on osism.utils between tests)
  • role="a//b"handle_role called once per segment
  • --retry 2 with handle_role always returning 1 → called 3 times, rc 1; success on 2nd attempt → 2 calls, rc 0
  • collection branch: handle_collection returns None, so rc != 0 is truthy and the loop breaks with take_action returning None — pin this current behavior in a test (candidate for a follow-up fix)

get_file_info() / collect_file_info()check.py:31 / check.py:59

Pure filesystem helpers — use tmp_path, no mocking needed for the happy paths.

  • small regular file → dict with inode, mtime, size, mode, uid, gid, is_link=False and an md5 hash
  • file ≥ 1 MiB → hash is None
  • unreadable file (patch builtins.open to raise IOError) → hash is None, rest populated
  • nonexistent path → {"error": ...}
  • collect_file_info: directory tree with .git/venv/__pycache__ subdirs → skipped; symlinks skipped; both files and directories included with relative paths
  • max_files=2 on a larger tree → scan stops, warning logged

parse_stat_output()check.py:88

Pure parser, no mocks.

  • well-formed FILE:/INODE:/SIZE:/MTIME:/HASH: blocks → typed dict (int, int, float)
  • HASH:NONENone
  • ERROR: lines captured under "error"
  • empty lines and key/value lines before any FILE: line ignored
  • multiple files in one output parsed independently

Mount._compare_file_info()check.py:338

Pure comparison, no mocks.

  • differing inodes → entry in inode_mismatches with local_inode/fresh_inode
  • differing hashes with check_content=Truecontent_mismatches; same input with check_content=False → empty
  • file only in fresh info → missing_in_local; only in local info → missing_in_fresh
  • entries containing "error" skipped entirely
  • falsy inode (None/0) on either side → no mismatch recorded
  • results sorted by file path

Mount._get_container_id() / Mount._get_mount_source()check.py:180 / check.py:222

Patch builtins.open with mock_open payloads and os.uname.

  • cgroup line containing docker with a 64-char id → first 12 chars returned; 12-char id returned as-is
  • cgroup unreadable + 12-char hostname → hostname returned
  • cgroup + hostname fail, /proc/self/mountinfo containing /docker/containers/<id>/ → id truncated to 12
  • nothing matches → None
  • _get_mount_source: mountinfo line whose 5th field equals the mount path → source after the - separator returned (must start with /); no separator / non-absolute source / no matching line / IOErrorNone

Mount.take_action()check.py:391

Focus on the early-exit guards and final rc; patch osism.commands.check.DOCKER_AVAILABLE, osism.commands.check.docker, os.path.exists, and the instance helpers (_get_container_id, _get_volume_mount_info, _get_mount_source, _run_fresh_container) plus collect_file_info.

  • path does not exist → 1 (format="script" prints FAILED: ...)
  • DOCKER_AVAILABLE=False → 1
  • Docker socket missing → 1
  • docker.from_env raises → 1
  • mount source not determinable (no --host-path, no Docker mount info, no mountinfo) → 1
  • bind mount → source taken from mount info; volume mount with --volume-name → override used
  • _run_fresh_container raises → 1
  • consistent comparison → 0 (script prints PASSED); inode mismatches → 1 with INODE_MISMATCHES:<n> in script format

Inode.take_action()check.py:662

  • explicit file list under tmp_path → rows with type/inode/size; symlinks and missing files skipped
  • no files given → random sampling from environments/* and inventory/* (patch random.sample for determinism); up to 2 entries per subdirectory plus direct files
  • all three formats (table via capsys, log, script); returns 0

validate.Run.take_action()validate.py:74 (gap)

Patch osism.tasks.ansible.run, osism.tasks.ceph.run, osism.tasks.kolla.run (.delay), osism.commands.validate.utils.check_task_lock_and_exit, and stub _handle_task.

  • kolla validator (e.g. keystone-config) → kolla.run.delay("kolla", "keystone", arguments) with -e kolla_action=config_validate appended to arguments
  • --environment custom honored for ceph/kolla runtimes (no default applied)
  • ceph validator (ceph-config) → ceph.run.delay("ceph", "validate", ...) (playbook rewritten via VALIDATE_PLAYBOOKS)
  • osism-ansible validator without explicit playbook key (e.g. ntp) → ansible.run.delay("generic", "validate-ntp", ...)
  • check_task_lock_and_exit invoked

validate.Run._handle_task()validate.py:55 (gap; timeout case exists)

  • wait=True, fetch_task_output returns rc → rc passed through
  • wait=False, format="log" → info log, returns 0
  • wait=False, format="script" → task id printed, returns 0

validate.Scs.take_action()validate.py:159

Patch osism.tasks.openstack.setup_cloud_environment, osism.tasks.openstack.cleanup_cloud_environment (imported inside the method) and osism.commands.validate.subprocess.run.

  • setup_cloud_environment returns success=False → returns 1, no subprocess call, no cleanup
  • happy path → command contains -s <cloud>, -a os_cloud=<cloud>, -V <version>, ends with scs-compatible-iaas.yaml; OS_CLIENT_CONFIG_FILE=/tmp/clouds.yaml in env; returncode passed through
  • --verbose/--debug/--tests/--output/--sections each append the corresponding flag
  • subprocess.run raises FileNotFoundError → 1; generic exception → 1
  • cleanup_cloud_environment(temp_files, original_cwd) called in all post-setup paths (finally)

wait.Run.get_all_task_ids() / take_action()wait.py:50 / wait.py:62 (gap; --live exit codes exist)

Patch celery.Celery, celery.result.AsyncResult, osism.commands.wait.time.sleep, and osism.utils._init_redis. Make AsyncResult return objects whose state changes between iterations (via side_effect) so re-queue loops terminate.

  • get_all_task_ids: merges ids from i.scheduled() and i.active(), returns them sorted
  • no task ids on CLI → ids pulled from get_all_task_ids, refresh mode enabled
  • PENDING + query_task finds nothing → "unavailable" logged / <id> = UNAVAILABLE printed, task not re-queued
  • PENDING + task known to a worker → re-queued, then SUCCESS on the next pass terminates
  • SUCCESS with --outputresult.get() printed
  • STARTED without --live → re-queued, finishes when state flips to SUCCESS
  • --refresh 1 → after the queue drains, get_all_task_ids consulted one more time
  • format="script" prints <id> = <STATE> lines instead of log output

compose.Run.take_action()compose.py:25

Patch osism.commands.compose.subprocess.call and osism.commands.compose.ensure_known_hosts_file.

  • builds ssh ... <OPERATOR_USER>@<host> 'docker compose --project-directory=/opt/<environment> <arguments>' with UserKnownHostsFile=<KNOWN_HOSTS_PATH>; arguments are joined without separators (current behavior — pin it)
  • ensure_known_hosts_file returns False → warning logged, SSH still attempted

sync.Facts / sync.CephKeys / sync.Sonicsync.py:21 / sync.py:46 / sync.py:90

Patch osism.commands.sync.utils.check_task_lock_and_exit, osism.tasks.ansible.run / osism.tasks.conductor.sync_sonic (.delay) and osism.tasks.handle_task.

  • Facts: ansible.run.delay("generic", "gather-facts", [], auto_release_time=3600), rc from handle_task
  • CephKeys: manager/copy-ceph-keys playbook; --no-waithandle_task(t, False)
  • Sonic: conductor.sync_sonic.delay(device, show_diff); device-specific vs. generic log message; --no-diffshow_diff=False

sync.Versions._get_kolla_version_from_release()sync.py:248

Patch requests.get (imported inside the method).

  • response with docker_images: {kolla: "0.20250928.0"} → version returned, URL is <repo>/<release>/base.yml
  • HTTP error (raise_for_status raises RequestException) → RuntimeError
  • invalid YAML body → RuntimeError
  • YAML without docker_images.kollaRuntimeError "Kolla version not found"

sync.Versions._sync_kolla_versions() / take_action()sync.py:311 / sync.py:288

Stub _extract_sbom_with_skopeo and _get_kolla_version_from_release; use tmp_path as --configuration-path.

  • --release 9.4.0 → image =<sbom-image-base>:<version-from-release>; _get_kolla_version_from_release raising → rc 1
  • version tag with a date part (0.20251128.0, also v-prefixed) → release SBOM image base; plain OpenStack version (2025.1) → registry.osism.cloud/kolla/sbom:2025.1
  • explicit --sbom-image → used verbatim, no derivation
  • non-dry-run with missing configuration path → rc 1
  • _extract_sbom_with_skopeo raising RuntimeError / YAMLError → rc 1
  • openstack_version from the SBOM overrides the CLI value in the rendered template
  • --dry-run → rendered versions.yml printed, nothing written, rc 0
  • happy path → file written to <config>/environments/kolla/versions.yml (directory auto-created), rc 0

sync.Versions._extract_sbom_with_skopeo()sync.py:169

Patch osism.commands.sync.subprocess.run to a no-op and pre-build a fake OCI layout (the tmpdir comes from tempfile.TemporaryDirectory — patch osism.commands.sync.tempfile.TemporaryDirectory to return a tmp_path-backed context): index.json → manifest blob → tar layer containing images.yml.

  • happy path → parsed images.yml dict returned
  • skopeo exits non-zero (CalledProcessError) → RuntimeError "skopeo copy failed"
  • skopeo binary missing (FileNotFoundError) → RuntimeError "skopeo not found"
  • layer that is not a tarfile → skipped, next layer used
  • no layer contains images.ymlRuntimeError "images.yml not found"

get.VersionsManager.take_action()get.py:21

Patch docker.from_env (imported inside the method).

  • three containers with org.opencontainers.image.version labels → table rows; ceph-ansible adds de.osism.release.ceph, kolla-ansible adds de.osism.release.openstack, osism-ansible has empty release
  • client.containers.get raising docker.errors.NotFound for one name → that row skipped, others still printed

get.Tasks.take_action()get.py:58

Patch celery.Celery so app.control.inspect() returns a mock with active() / scheduled() dicts.

  • active and scheduled tasks rendered with worker, id, name, status ACTIVE/SCHEDULED, start time (datetime.fromtimestamp) and args
  • empty inspect results → empty table, no exception

get.Facts.take_action() / get.States.take_action()get.py:189 / get.py:277

Patch the lazy redis client (mocker.patch("osism.utils._init_redis", return_value=MagicMock()) or patch osism.commands.get.utils.redis).

  • Facts: no cache entry → error "No facts found in cache"; specific fact present → single row; fact missing → error logged; full listing truncates the four ansible_ssh_host_key_*_public facts to 40 chars + ...
  • States: facts with ansible_local.osism → one row per role with state/timestamp, bootstrap skipped; missing ansible_local/osism key or no cache entry → nothing printed, no exception

get.Hostvars / get.Hosts happy paths — get.py:127 / get.py:238 (gap)

  • Hostvars with variable present → grid table row printed; without variable argument → one row per variable
  • Hosts with hosts in inventory → psql table of hostnames (capsys)

log.Ansible / log.Containerlog.py:31 / log.py:54

Patch osism.commands.log.subprocess.call and osism.commands.log.ensure_known_hosts_file.

  • Ansible: parameters joined and appended to /usr/local/bin/ara
  • Container: command contains docker logs <parameters> <container> and <OPERATOR_USER>@<host>; ensure_known_hosts_file failure → warning, call still made

log.File.take_action()log.py:105

Patch osism.commands.log.get_hosts_from_group, osism.commands.log.resolve_host_with_fallback, osism.commands.log.subprocess.call, osism.commands.log.ensure_known_hosts_file.

  • path traversal (../../etc/passwd) → error "must stay within /var/log", rc 1, no subprocess call; kolla/nova/nova-compute.log/var/log/kolla/nova/nova-compute.log accepted
  • tail command: -n <lines> always, -f only with --follow, path shell-quoted via shlex.quote
  • group with multiple hosts → clush invoked with -w host1,host2; clush rc != 0 → rc passed through with error log
  • group with exactly one host → host substituted, ssh path used
  • non-group host → resolve_host_with_fallback result used in user@host; ssh rc != 0 → passthrough; success → 0

log.Opensearch.take_action()log.py:203 (low priority)

Patch osism.commands.log.PromptSession (session.prompt side_effect=["<query>", "exit"]) and osism.commands.log.requests.post.

  • exit immediately breaks the loop
  • response with hitsPayload printed per hit; --verbose prints timestamp | Hostname | [programname |] Payload, falling back to @timestamp when timestamp is absent
  • response without hits → raw JSON printed

console module helpers — console.py:18 / console.py:37 / console.py:65 / console.py:97 / console.py:128

  • resolve_hostname_to_ip (console.py:18): patch socket.gethostbyname — success → IP; socket.gaierrorNone
  • get_primary_ipv4_from_netbox (console.py:37): patch osism.commands.console.utils.nbnb falsy → None; device with primary_ip4.address = "10.0.0.1/24""10.0.0.1"; device None / no primary_ip4None; query raising → None with warning
  • resolve_host_with_fallback (console.py:65): DNS hit → IP; DNS miss + Netbox hit → Netbox IP; both miss → original hostname returned with warning
  • get_hosts_from_group (console.py:97): patch osism.commands.console.subprocess.check_output, get_inventory_path, get_hosts_from_inventory — valid inventory → sorted host list; any exception (e.g. CalledProcessError) → []
  • select_host_from_list (console.py:128): patch osism.commands.console.prompt — valid number → that host; q/quit/exitNone; non-numeric then valid input → retries; out-of-range then valid → retries

console.Run.take_action()console.py:172

Patch osism.commands.console.subprocess.call, ensure_known_hosts_file, get_hosts_from_group, resolve_host_with_fallback, select_host_from_list, prompt.

  • host syntax routing: "ctl001/" → container_prompt loop; "ctl001/rabbitmq" → container; ".ctl001"/run-ansible-console.sh ctl001; ":ctl" → clush with -g ctl and -l <OPERATOR_USER>
  • ssh type: group resolving to one host → that host used; multiple hosts → select_host_from_list; selection cancelled (None) → returns without SSH call
  • ssh call uses resolve_host_with_fallback result and UserKnownHostsFile=<KNOWN_HOSTS_PATH>
  • container type: docker exec -it <container> bash with both parts shlex.quoted, RequestTTY=force in options, host part resolved
  • container_prompt: command then exit → one SSH call with docker <quoted command>

Mocking hints

  • Instantiate cliff commands as in the existing tests: cmd = module.Class(MagicMock(), MagicMock()), then parsed_args = cmd.get_parser("test").parse_args([...]) — this exercises the real argparse defaults (tests/unit/commands/test_wait.py shows the pattern).
  • Celery task modules (osism.tasks.ansible, ceph, kolla, kubernetes, conductor) and osism.tasks.handle_task are imported inside method bodies — patch them at their canonical module path (e.g. mocker.patch("osism.tasks.ansible.run")), not on the command module. Assert on .si(...) / .delay(...) call args; no broker is needed since apply_async/delay are mocked.
  • osism.data.playbooks.MAP_ROLE2ENVIRONMENT / MAP_ROLE2RUNTIME are lazy module attributes loaded from /interface/playbooks via module-level __getattr__; set them with monkeypatch.setattr(playbooks, "MAP_ROLE2ENVIRONMENT", {...}, raising=False) and call osism.data.playbooks._reset_caches() in teardown (an autouse fixture keeps this tidy).
  • The lazy redis attribute on osism.utils: patch osism.utils._init_redis to return a MagicMock (as test_wait.py already does) before anything touches utils.redis.
  • apply.take_action stores utils._last_ansible_facts_check as an attribute on the osism.utils module — delete/reset it between tests (monkeypatch.delattr(osism.utils, "_last_ansible_facts_check", raising=False)).
  • wait.take_action loops until states converge: give the patched AsyncResult a side_effect list whose states end in SUCCESS, and always patch osism.commands.wait.time.sleep.
  • check.py guards the docker import with try/except — patch osism.commands.check.DOCKER_AVAILABLE and osism.commands.check.docker directly. The pure helpers (get_file_info, collect_file_info, parse_stat_output, _compare_file_info) need only tmp_path.
  • sync.Versions: requests and jinja2 are imported inside methods — patch requests.get; for the template-render assertions just inspect the written file/printed output instead of mocking jinja2.
  • log.py imports get_hosts_from_group and resolve_host_with_fallback from osism.commands.console at module level — patch them as osism.commands.log.<name>.
  • Interactive prompts: patch osism.commands.console.prompt / osism.commands.log.PromptSession with side_effect sequences ending in "exit"/"q" so loops terminate.
  • Use capsys for the tabulate/print-based assertions (apply role table, get tables, check script/table formats).

Definition of Done

  • tests/unit/commands/test_apply.py, test_check.py, test_compose.py, test_sync.py, test_log.py, test_console.py created
  • tests/unit/commands/test_validate.py, test_wait.py, test_get.py extended (existing tests kept unchanged)
  • All listed cases covered
  • pytest --cov shows ≥ 80 % for each module in scope (≥ 95 % for the pure helpers in check.py and the module-level functions in console.py)
  • pipenv run pytest tests/unit/commands/ passes locally
  • flake8, mypy, python-black remain green
  • Zuul job python-osism-unit-tests passes

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions