Skip to content

fix(ibmcloud): preserve path component in COS backend URL#830

Open
adrianriobo wants to merge 12 commits into
redhat-developer:mainfrom
adrianriobo:fix-829
Open

fix(ibmcloud): preserve path component in COS backend URL#830
adrianriobo wants to merge 12 commits into
redhat-developer:mainfrom
adrianriobo:fix-829

Conversation

@adrianriobo

@adrianriobo adrianriobo commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

When a backed URL includes a path (e.g. s3://bucket/some/path), the previous code discarded the path and stored Pulumi state at the bucket root. Refactor extractBucket into extractBucketAndPath so the path is carried through to PULUMI_BACKEND_URL.

Fixes #829
Fixes #832

@coderabbitai

This comment was marked as outdated.

When a backed URL includes a path (e.g. s3://bucket/some/path), the
previous code discarded the path and stored Pulumi state at the bucket
root. Refactor extractBucket into extractBucketAndPath so the path is
carried through to PULUMI_BACKEND_URL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 35-36: The /etc/resolv.conf fallback currently excludes only IPv4
loopback (127.*) so IPv6 loopback (::1) can still be captured into _dns_servers;
update the awk filter in the _dns_servers assignment (the line using awk
'/^nameserver/ && $2 !~ /^127\./ {print $2}') to also exclude the IPv6 loopback
(e.g. add a condition like $2 != "::1" or $2 !~ /^::1/), ensuring nameserver
entries equal to ::1 are not included.
- Around line 48-54: The script currently checks for "dns_servers" globally
which can match other sections; change the detection/update logic so it only
looks inside the [containers] section: when checking for existence use a
section-scoped search (e.g., search between the line matching "^\[containers\]"
and the next section header "^\[.*\]") to decide whether to replace dns_servers
or append it after the [containers] header; update the two sed/grep branches
that reference dns_servers and /etc/containers/containers.conf and ensure the
inserted value uses the same ${_toml_list} variable and formatting as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 466885cf-1897-4a3c-8cf6-9db3d3fcc8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1f1fc and c0d0b15.

📒 Files selected for processing (1)
  • pkg/integrations/gitlab/snippet-linux.sh

Comment thread pkg/integrations/gitlab/snippet-linux.sh Outdated
Comment thread pkg/integrations/gitlab/snippet-linux.sh
Detect the host's upstream DNS servers at runner setup time and write
them to /etc/containers/containers.conf so Podman propagates them into
every container it creates, including the nested inner containers spawned
by `podman build` RUN steps.

Without this, inner build containers inherit the systemd-resolved loopback
stub (127.0.0.53) which is unreachable from inside a container, causing
intermittent "Could not resolve host" failures for external domains.

Detection tries resolvectl, nmcli, and /etc/resolv.conf in order,
filtering out loopback addresses at each step. The existence check and
in-place replacement are scoped to the [containers] section via awk to
avoid false-positive matches in other sections (e.g. [network]).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass --name "{{ .Name }}" to gitlab-runner register in all platform
snippets (Linux, Darwin, Windows). Without this flag the runner defaults
to the system hostname, which is not meaningful in ephemeral environments.

Set the Name field from ProjectName() rather than RunID() so the runner
is identified by the project it belongs to.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as resolved.

@deekay2310 deekay2310 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my one comment regarding context.go, we should also address CodeRabbit's comments and I think this PR will be ready to merge.


func manageIntegration(c *Context, ca *ContextArgs) error {
if ca.GHRunnerArgs != nil {
ca.GHRunnerArgs.Name = c.RunID()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be changed to c.ProjectName() just like GLRunnerArgs.name?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I just found out using the RunID as name is not really helpful so make it with the project name clarifies it, but as this change is GH related can you change it on your PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll look into it

When both GitLab runner and OTel are configured, Podman is set to use
the journald log driver so CI job container output is captured by
systemd journal. The otelcol config gains a journald/gitlab-jobs
receiver that parses CONTAINER_NAME to extract job_id, project_id, and
runner_token, and the existing filelog/gitlab-runner receiver gets
regex_parser operators to promote job= and runner= fields from the
runner daemon log body into attributes. Both streams share job_id,
enabling cross-stream correlation in any OTel backend.

The journald log driver change is gated on a new LogToJournald field in
GitLabRunnerArgs, set only when hasOtel is true in the IBM Power and
IBM Z providers. The otelcol receivers remain gated on the existing
MonitorGitLabRunner flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng containers.conf

Root cause (confirmed live on s390x runner):

  podman build RUN steps run inside a nested container created by inner
  Podman inside the docker executor container.  Inner Podman (using
  Netavark) creates a new bridge for these build-step containers, but the
  privileged-but-bridge-networked outer container has no working iptables/
  nftables NAT, so packets from the 10.88.0.0/16 bridge subnet can never
  reach the DNS servers.  The inner containers' /etc/resolv.conf had the
  correct nameservers written but DNS queries timed out because they could
  not be routed.

  The previous chmod 644 and dns_servers fixes correctly configured the
  outer executor container but did not propagate into the nested build-step
  containers.

Fixes applied:

  1. Add netns = "host" to /etc/containers/containers.conf.
     When the file is bind-mounted into the executor container (see redhat-developer#3),
     inner Podman reads this and creates all build-step containers sharing
     the outer container's network namespace instead of a new bridge.
     They inherit the working resolv.conf with no NAT required.

  2. Guarantee /etc/containers/containers.conf exists before runner
     registration even when DNS detection finds no upstream servers, so
     the bind mount in redhat-developer#3 always has a real file to attach to.

  3. Add --docker-volumes to the runner registration so that every
     executor container gets the host's containers.conf bind-mounted at
     /etc/containers/containers.conf:ro.  This makes the nested inner
     Podman pick up both dns_servers and netns = "host" automatically,
     without any change to the CI job Dockerfile or podman build invocation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@adrianriobo adrianriobo force-pushed the fix-829 branch 2 times, most recently from 74eb149 to 16330a5 Compare June 18, 2026 15:40
…n build

Two problems with inner podman build containers inside the CI executor:

1. DNS failed because the executor container's resolv.conf points to a
   loopback stub (127.0.0.53/systemd-resolved) that is unreachable from
   inside nested containers. Fixed by detecting real upstream DNS servers
   via resolvectl/nmcli/resolv.conf and writing them to containers.conf
   dns_servers so Podman passes working nameservers to inner containers.

2. netns="host" was added as a workaround to give inner containers a
   routable network stack, but it causes them to share the actual host
   network namespace — so concurrent CI jobs compete for the same ports
   (e.g. nginx: bind() to 0.0.0.0:8080 failed: Address already in use).
   Replaced with sysctl ip_forward=1 so Netavark's bridge NAT works
   correctly, giving each inner container an isolated namespace while
   still routing egress through the executor container's interface.

The containers.conf is bind-mounted into every executor container via
--docker-volumes so Podman inside the CI job inherits dns_servers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adrianriobo and others added 6 commits June 19, 2026 08:39
…config

The mount-data-home.sh script looped forever waiting for a TYPE=mpath
block device to appear, with no exit condition and TimeoutStartSec=0 in
the systemd unit. On larger LPAR sizes (32+ cores / 192+ GB), IBM Cloud
takes longer to establish Fibre Channel paths, so the device can take
many minutes to appear. The infinite loop blocked all subsequent runcmd
steps (restorecon, runner installation) indefinitely.

Two changes:
- Explicitly start multipathd before entering the loop so the daemon is
  guaranteed to be running when we call 'multipathd reconfigure'.
- Cap the loop at 60 iterations (~10 minutes). If no multipath device
  appears in that window, proceed without mounting the data volume so the
  runner installation can still complete. The instance remains functional
  on its root filesystem; the data volume can be mounted manually once
  the FC paths are established.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…urces

Adds TagsAsStringArray helper to convert map[string]string (key=value from
--tags flag) to the key:value string format expected by IBM Cloud user tags.
Threads tags through the network module and both action packages so every
taggable resource gets the user-supplied tags: PiVolume, PiInstance
(PiUserTags) on PowerVS; IsInstance, IsSshKey, IsVpc, IsSubnet,
IsPublicGateway, IsSecurityGroup, IsFloatingIp, and ResourceGroup (Tags)
on VPC/IBM Z paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-dns at registration

Two improvements to make DNS reliable on all platforms:

1. Add /run/systemd/resolve/resolv.conf as a DNS detection fallback.
   On Ubuntu 24.04 with systemd-resolved, /etc/resolv.conf points to the
   127.0.0.53 stub which is filtered out. The real upstream servers live in
   /run/systemd/resolve/resolv.conf and are needed for the fix to work.

2. Pass detected DNS servers as --docker-dns when registering the runner.
   containers.conf dns_servers fixes DNS for nested podman build containers,
   but Podman's Docker socket API does not always honour it when creating the
   executor container itself. Passing --docker-dns explicitly in config.toml
   ensures every job container gets working DNS regardless.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for persistent DNS failures inside CI executor containers:

1. Public DNS fallback: if resolvectl/nmcli/resolv.conf detect nothing
   (e.g. on non-standard cloud images), fall back to 8.8.8.8/8.8.4.4.
   The machine has internet access (it talks to GitLab), so public DNS works.

2. Explicit iptables MASQUERADE for 10.88.0.0/16: on RHEL 9, 'podman system
   reset --force' disrupts Netavark's firewalld integration. Netavark normally
   adds the MASQUERADE rule when the first container is created, but the runner
   resolves DNS at job startup — before any container has ever run. Adding the
   rule explicitly ensures it is in place before the first CI job.

3. firewall-cmd --add-masquerade: belt-and-suspenders for RHEL/firewalld
   systems. Enables masquerade at the zone level in addition to the iptables
   rule, so forwarded container traffic is NATted regardless of how firewalld
   manages its nftables entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…permanent

The sysctl ip_forward settings and firewall-cmd masquerade rule added in
the previous fix were transient: sysctl -w is not preserved across reboots,
and firewall-cmd without --permanent is lost on firewalld restart.

Persist ip_forward via /etc/sysctl.d/99-podman-ip-forward.conf so Podman
bridge NAT works correctly after any reboot.

Add --permanent to firewall-cmd --add-masquerade and follow it with
--reload so the rule is immediately active AND survives across reboots and
firewalld restarts on RHEL/Fedora (ppc64le) machines.

Non-firewalld systems (Ubuntu/s390x) are unaffected: firewall-cmd fails
silently via || true and the iptables POSTROUTING rule covers them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rk subnet

Root cause (confirmed live on s390x runner):

  The host's Podman CNI bridge uses 10.88.0.0/16. When gitlab-runner
  creates an executor container (on the host's 10.88.0.0/16 bridge,
  e.g. at 10.88.0.22), and the CI job runs 'podman run' inside that
  executor, inner Podman (Netavark) creates its own bridge — also at
  10.88.0.0/16 — inside the executor's network namespace.

  This produces two competing kernel routes inside the executor:
    10.88.0.0/16 dev eth0  src 10.88.0.22    (executor → host)
    10.88.0.0/16 dev cni-podman0  src 10.88.0.1  (Netavark bridge)

  Nested containers land on Netavark's bridge (e.g. at 10.88.0.23).
  The duplicate route to 10.88.0.0/16 breaks forwarding: DNS queries
  from inner containers to IBM Cloud resolvers (10.130.98.x) have no
  unambiguous egress path, causing socket.gaierror [Errno -2].

  Note: the executor container's own DNS works fine because it reaches
  IBM DNS directly through the host bridge, without going through
  Netavark. Only nested 'podman run' containers are affected.

Fix:

  Add [network] default_subnet = "192.168.100.0/24" to
  /etc/containers/containers.conf, which is bind-mounted into every
  executor container via --docker-volumes. Netavark inside the executor
  reads this and creates its bridge on 192.168.100.0/24, leaving
  10.88.0.0/16 exclusively for the executor's eth0 (host connection).
  Forwarding and MASQUERADE then work correctly, giving nested
  containers working DNS and internet access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/provider/ibmcloud/ibmcloud.go (1)

227-231: ⚡ Quick win

Stabilize tag serialization order to avoid nondeterministic resource diffs.

At Line 229, iterating a Go map directly makes pulumi.StringArray order nondeterministic across runs. Since this helper feeds many IBM Cloud resources, this can cause noisy previews/updates if ordering is treated as significant. Sort keys before append.

Proposed fix
+import "sort"
...
 func TagsAsStringArray(tags map[string]string) pulumi.StringArray {
 	result := make(pulumi.StringArray, 0, len(tags))
-	for k, v := range tags {
+	keys := make([]string, 0, len(tags))
+	for k := range tags {
+		keys = append(keys, k)
+	}
+	sort.Strings(keys)
+	for _, k := range keys {
+		v := tags[k]
 		result = append(result, pulumi.String(fmt.Sprintf("%s:%s", k, v)))
 	}
 	return result
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/ibmcloud/ibmcloud.go` around lines 227 - 231, The
TagsAsStringArray function iterates directly over the tags map, which produces
nondeterministic ordering in Go since map iteration order is randomized. To fix
this, extract all the keys from the tags map into a slice, sort that slice of
keys, and then iterate over the sorted keys in the for loop instead of directly
iterating over the map. This ensures the resulting pulumi.StringArray has
consistent ordering across multiple runs, eliminating noisy diffs in resource
previews and updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 84-89: The grep check on line 84 searches for dns_options globally
in the file without scoping it to the [containers] section, and the subsequent
sed replacement on line 85 also operates globally. This can cause unintended
modifications to dns_options in other sections or leave [containers].dns_options
unset. Modify the grep pattern to specifically check for dns_options within the
[containers] section (for example, by using a pattern that matches dns_options
after the [containers] header), and update the sed command on line 85 to scope
the replacement to only the [containers] section to ensure consistent Podman DNS
configuration.

In `@pkg/provider/ibmcloud/services/power/power.go`:
- Around line 84-96: The loop that calls vc.Get(volumeId) currently retries on
all errors for the full duration, but should fail fast on permanent failures.
Modify the error handling block (the else clause where err is not nil) to
inspect the underlying power-go-client error for its HTTP status code. If the
error is a non-retriable 4xx response, return the error immediately to fail
fast. If the error is transient (5xx or 429), continue with the existing retry
logic that sleeps and retries. This same fix should be applied to the similar
error handling pattern in the section that also handles vc.Get errors (lines
97-104 and beyond).

---

Nitpick comments:
In `@pkg/provider/ibmcloud/ibmcloud.go`:
- Around line 227-231: The TagsAsStringArray function iterates directly over the
tags map, which produces nondeterministic ordering in Go since map iteration
order is randomized. To fix this, extract all the keys from the tags map into a
slice, sort that slice of keys, and then iterate over the sorted keys in the for
loop instead of directly iterating over the map. This ensures the resulting
pulumi.StringArray has consistent ordering across multiple runs, eliminating
noisy diffs in resource previews and updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 996c0259-0ea3-4fe6-9953-7253897ca76e

📥 Commits

Reviewing files that changed from the base of the PR and between b30e0ee and c22d461.

📒 Files selected for processing (11)
  • pkg/integrations/gitlab/glrunner.go
  • pkg/integrations/gitlab/snippet-linux.sh
  • pkg/integrations/gitlab/types.go
  • pkg/integrations/integrations.go
  • pkg/integrations/otelcol/snippet-linux.sh
  • pkg/provider/ibmcloud/action/ibm-power/cloud-config
  • pkg/provider/ibmcloud/action/ibm-power/ibm-power.go
  • pkg/provider/ibmcloud/action/ibm-z/ibm-z.go
  • pkg/provider/ibmcloud/ibmcloud.go
  • pkg/provider/ibmcloud/modules/network/network.go
  • pkg/provider/ibmcloud/services/power/power.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/integrations/integrations.go
  • pkg/integrations/gitlab/types.go
  • pkg/integrations/gitlab/glrunner.go
  • pkg/integrations/otelcol/snippet-linux.sh

Comment on lines +84 to +89
if grep -q '^dns_options' /etc/containers/containers.conf; then
sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope dns_options edits to the [containers] section.

Line 84 checks dns_options globally, and Line 85 replaces it globally. If another section defines dns_options, [containers].dns_options may remain unset (or unrelated sections get modified), causing inconsistent Podman DNS behavior.

Proposed fix
-        # Add or update dns_options within [containers]
-        if grep -q '^dns_options' /etc/containers/containers.conf; then
-            sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
-                /etc/containers/containers.conf
+        # Add or update dns_options within [containers]
+        if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \
+                /etc/containers/containers.conf; then
+            sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \
+                /etc/containers/containers.conf
         else
             sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
                 /etc/containers/containers.conf
         fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if grep -q '^dns_options' /etc/containers/containers.conf; then
sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf
# Add or update dns_options within [containers]
if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \
/etc/containers/containers.conf; then
sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/integrations/gitlab/snippet-linux.sh` around lines 84 - 89, The grep
check on line 84 searches for dns_options globally in the file without scoping
it to the [containers] section, and the subsequent sed replacement on line 85
also operates globally. This can cause unintended modifications to dns_options
in other sections or leave [containers].dns_options unset. Modify the grep
pattern to specifically check for dns_options within the [containers] section
(for example, by using a pattern that matches dns_options after the [containers]
header), and update the sed command on line 85 to scope the replacement to only
the [containers] section to ensure consistent Podman DNS configuration.

Comment on lines +84 to +96
for i := 0; i < 60; i++ { // up to ~10 minutes
vol, err := vc.Get(volumeId)
if err == nil {
switch vol.State {
case "available", "in-use":
logging.Infof("volume %s is in state %q, proceeding with attachment", volumeId, vol.State)
return volumeId, nil
default:
logging.Infof("volume %s state: %q, retrying in 10s...", volumeId, vol.State)
}
} else {
logging.Infof("volume %s GET returned error (retrying in 10s): %v", volumeId, err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in the file
cat -n pkg/provider/ibmcloud/services/power/power.go | sed -n '79,104p'

Repository: redhat-developer/mapt

Length of output: 1114


🏁 Script executed:

# Let's also check the imports and understand what vc (volumeClient) is
head -50 pkg/provider/ibmcloud/services/power/power.go

Repository: redhat-developer/mapt

Length of output: 1428


🏁 Script executed:

# Search for volumeClient definition and initialization
rg -n "volumeClient\|vc\.Get\|IBMPIVolumeClient" pkg/provider/ibmcloud/services/power/power.go -A 3 -B 1

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Check if there are any error type definitions or status codes being handled
rg -n "error\|Error\|status" pkg/provider/ibmcloud/services/power/power.go | head -30

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Let's find the dependency on power-go-client
rg "github.com/IBM-Cloud/power-go-client" go.mod

Repository: redhat-developer/mapt

Length of output: 111


🏁 Script executed:

# Find the volumeClient function definition
rg -n "func volumeClient" pkg/provider/ibmcloud/services/power/power.go -A 15

Repository: redhat-developer/mapt

Length of output: 689


🏁 Script executed:

# Check for any error handling patterns in the file
rg -n "http\|status\|StatusCode\|Type\|errors\." pkg/provider/ibmcloud/services/power/power.go -i

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Search for similar retry patterns in other files for context
rg -n "for.*i.*<.*60\|time\.After.*10" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Look for error classification/handling in other power-related files
find pkg/provider/ibmcloud -name "*.go" -type f | head -10

Repository: redhat-developer/mapt

Length of output: 520


🏁 Script executed:

# Check if there's any existing error classification in the codebase
rg -n "IsRetryable\|Retryable\|Transient\|Permanent" pkg/provider/ibmcloud/ -i

Repository: redhat-developer/mapt

Length of output: 47


🌐 Web query:

power-go-client IBMPIVolumeClient error handling retriable

💡 Result:

In IBM-Cloud/power-go-client, errors returned by the lower-level API calls (including those from the IBMPIVolumeClient volume-related methods) are wrapped via ibmpisession.SDKFailWithAPIError when the SDK call returns a non-nil error (example shown in ibm-pi-instance.go: it returns SDKFailWithAPIError(err, …) on err!= nil). [1] What this means for “retriable” handling: 1) power-go-client itself does not expose (in the reviewed code) a per-method “IsRetriable” hook specifically for IBMPIVolumeClient errors; instead, retry behavior is governed by the shared IBM Go SDK HTTP/retry mechanisms (ibm-cloud-sdk-common / ibm/go-sdk-core) and its generic error handling. The project is built on IBM SDK core and supports “automatic retries” as described in IBM’s SDK common documentation. [2] 2) Therefore, whether a particular IBMPIVolumeClient error is retried depends on how the IBM SDK core classifies the underlying HTTP/API error as retryable (e.g., throttling/rate-limit, transient network/5xx patterns), not on any IBMPIVolumeClient-specific “retriable error” implementation. Practical guidance (how to handle retriable errors in your code): - Enable/adjust SDK automatic retries at the IBM SDK core/client level (documented as “Automatic retries” in ibm-cloud-sdk-common). [2] - If you need explicit application-level retry decisions for power-go-client errors, you will need to inspect the error type/content you get from SDKFailWithAPIError (e.g., underlying HTTP status / error payload) and implement your own retry policy in your caller. If you can share the exact IBMPIVolumeClient method name and the error you’re seeing (HTTP status code, message), I can pinpoint whether that specific class of failure is typically treated as retryable by the IBM SDK core layer and give a concrete snippet for retry decision logic.

Citations:


🌐 Web query:

IBM Cloud Power API error codes status codes

💡 Result:

IBM Cloud Power (PowerVS/Power Cloud) API “error codes / status codes” are primarily the standard HTTP status codes returned by the API, plus a service-specific error payload (for Power Cloud endpoints you’ll commonly see values like pcloudPvminstancesActionPostBadRequest with a description/message). IBM’s API handbook states that HTTP responses use standard 2xx/4xx/5xx categories, and when a 4xx/5xx status code is returned the response body uses an error container model. [1][2][3] 1) HTTP status codes (what you should key off of) - 4xx client errors: these indicate the request is invalid or cannot be fulfilled due to client-side issues; IBM documents the general meaning of 4xx codes and notes common ones like 404/405/409/etc. [1] - 5xx server errors: these indicate server-side failure; IBM documents general meaning of 5xx codes like 500/504/505/etc. [1] - IBM specifically says that if the service returns a 400-series or 500-series status, the response body must be an error container model. [2][3] 2) Service-specific PowerVS/Power Cloud error codes you’ll see in practice (examples) In PowerVS API calls under the /pcloud/... namespace, common failures surface as a combination of HTTP status + an error “title”/reason string (e.g., pcloudPvminstancesActionPostBadRequest) and a description message. - 400 Bad Request (example: action not allowed due to current state) - Starting a Power Virtual Server instance that is already in ACTIVE state can return HTTP 400 with a message like “is already in an ACTIVE state and cannot be started.” [4] - 400 Bad Request (example: invalid/required parameter missing) - Creating a PVM instance can return HTTP 400 where the error description mentions that a calculated parameter failed (for example, “storageType is required when not using a storage affinity policy”). [5] - 400 Bad Request (example: operation not supported in environment) - Updating a PVM instance can return HTTP 400 with “cloudInitialisation is not supported in this environment.” [6] - 409 Conflict (example: cannot attach volume when instance health/status not OK) - Volume attachment can return HTTP 409 conflict, with a message indicating the volume can’t be attached due to the instance health not being “OK” (e.g., health=WARNING). [7] - 409 Conflict (example: attach fails due to instance/VM state constraints) - The Terraform-provider issue for attachment shows an HTTP 409 conflict response for volume attachment to a PVM instance. [7] - 400 Bad Request (example: clone cancel only allowed in certain states) - Cancelling a volume clone request that is already in “completed” state returns HTTP 400, with a message indicating it must be in “prepared or available” to cancel. [8] 3) How to interpret “status_code” vs “error code” - IBM’s API handbook distinguishes HTTP status codes (what the HTTP layer returns) from the structured error payload that accompanies 4xx/5xx responses. The error container model includes an errors field with objects that include a code and message, and it may include status_code and trace. [2][9] - For Power Cloud/PowerVS, the examples you’ll encounter in client output show both: a) The HTTP status (400/409, etc.) b) A service-generated error key/name and a human-readable Description/Message. [8][4][5][7] Practical takeaway - If you’re mapping “Power API error codes” to a decision system, start with the HTTP status code (400 vs 409 vs 5xx) as the primary discriminator, then inspect the Power Cloud error payload fields (code/message/description) for the specific root cause (e.g., “already in ACTIVE”, “cannot cancel because state is completed”, “cloudInitialisation not supported”, “cannot attach because health not OK”). [1][2][4][8][7]

Citations:


🏁 Script executed:

# Check if there are examples or tests showing how errors are handled
rg -n "Get\|error" pkg/provider/ibmcloud/action/ibm-power/ -A 2 -B 1 | head -40

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Look for any existing patterns of error classification in the codebase
rg -n "switch.*err\|type.*error" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Check go.mod for error handling related dependencies
cat go.mod | grep -i "error\|retry\|backoff"

Repository: redhat-developer/mapt

Length of output: 321


Fail fast on non-retriable volume lookup failures.

Lines 84-96 currently retry all vc.Get(volumeId) errors for ~10 minutes. Permanent failures (e.g., invalid volume ID, permission errors, volume not found) should return immediately; otherwise this adds avoidable timeout delays and masks root causes during incident triage.

Classify errors by HTTP status code and retry only transient ones (5xx, 429), while failing fast on non-retriable 4xx responses. The underlying power-go-client error can be inspected for the HTTP status to make this determination.

Also applies to: 97-104

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/ibmcloud/services/power/power.go` around lines 84 - 96, The loop
that calls vc.Get(volumeId) currently retries on all errors for the full
duration, but should fail fast on permanent failures. Modify the error handling
block (the else clause where err is not nil) to inspect the underlying
power-go-client error for its HTTP status code. If the error is a non-retriable
4xx response, return the error immediately to fail fast. If the error is
transient (5xx or 429), continue with the existing retry logic that sleeps and
retries. This same fix should be applied to the similar error handling pattern
in the section that also handles vc.Get errors (lines 97-104 and beyond).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants