fix(ibmcloud): preserve path component in COS backend URL by adrianriobo · Pull Request #830 · redhat-developer/mapt

adrianriobo · 2026-06-09T13:11:17Z

When a backed URL includes a path (e.g. s3://bucket/some/path), the previous code discarded the path and stored Pulumi state at the bucket root. Refactor extractBucket into extractBucketAndPath so the path is carried through to PULUMI_BACKEND_URL.

Fixes #829
Fixes #832

When a backed URL includes a path (e.g. s3://bucket/some/path), the previous code discarded the path and stored Pulumi state at the bucket root. Refactor extractBucket into extractBucketAndPath so the path is carried through to PULUMI_BACKEND_URL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 35-36: The /etc/resolv.conf fallback currently excludes only IPv4
loopback (127.*) so IPv6 loopback (::1) can still be captured into _dns_servers;
update the awk filter in the _dns_servers assignment (the line using awk
'/^nameserver/ && $2 !~ /^127\./ {print $2}') to also exclude the IPv6 loopback
(e.g. add a condition like $2 != "::1" or $2 !~ /^::1/), ensuring nameserver
entries equal to ::1 are not included.
- Around line 48-54: The script currently checks for "dns_servers" globally
which can match other sections; change the detection/update logic so it only
looks inside the [containers] section: when checking for existence use a
section-scoped search (e.g., search between the line matching "^\[containers\]"
and the next section header "^\[.*\]") to decide whether to replace dns_servers
or append it after the [containers] header; update the two sed/grep branches
that reference dns_servers and /etc/containers/containers.conf and ensure the
inserted value uses the same ${_toml_list} variable and formatting as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 466885cf-1897-4a3c-8cf6-9db3d3fcc8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1f1fc and c0d0b15.

📒 Files selected for processing (1)

pkg/integrations/gitlab/snippet-linux.sh

Detect the host's upstream DNS servers at runner setup time and write them to /etc/containers/containers.conf so Podman propagates them into every container it creates, including the nested inner containers spawned by `podman build` RUN steps. Without this, inner build containers inherit the systemd-resolved loopback stub (127.0.0.53) which is unreachable from inside a container, causing intermittent "Could not resolve host" failures for external domains. Detection tries resolvectl, nmcli, and /etc/resolv.conf in order, filtering out loopback addresses at each step. The existence check and in-place replacement are scoped to the [containers] section via awk to avoid false-positive matches in other sections (e.g. [network]). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pass --name "{{ .Name }}" to gitlab-runner register in all platform snippets (Linux, Darwin, Windows). Without this flag the runner defaults to the system hostname, which is not meaningful in ephemeral environments. Set the Name field from ProjectName() rather than RunID() so the runner is identified by the project it belongs to. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

deekay2310

Apart from my one comment regarding context.go, we should also address CodeRabbit's comments and I think this PR will be ready to merge.

deekay2310 · 2026-06-12T03:22:21Z


 func manageIntegration(c *Context, ca *ContextArgs) error {
 	if ca.GHRunnerArgs != nil {
 		ca.GHRunnerArgs.Name = c.RunID()


Should this also be changed to c.ProjectName() just like GLRunnerArgs.name?

yeah I just found out using the RunID as name is not really helpful so make it with the project name clarifies it, but as this change is GH related can you change it on your PR?

Sounds good! I'll look into it

When both GitLab runner and OTel are configured, Podman is set to use the journald log driver so CI job container output is captured by systemd journal. The otelcol config gains a journald/gitlab-jobs receiver that parses CONTAINER_NAME to extract job_id, project_id, and runner_token, and the existing filelog/gitlab-runner receiver gets regex_parser operators to promote job= and runner= fields from the runner daemon log body into attributes. Both streams share job_id, enabling cross-stream correlation in any OTel backend. The journald log driver change is gated on a new LogToJournald field in GitLabRunnerArgs, set only when hasOtel is true in the IBM Power and IBM Z providers. The otelcol receivers remain gated on the existing MonitorGitLabRunner flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng containers.conf Root cause (confirmed live on s390x runner): podman build RUN steps run inside a nested container created by inner Podman inside the docker executor container. Inner Podman (using Netavark) creates a new bridge for these build-step containers, but the privileged-but-bridge-networked outer container has no working iptables/ nftables NAT, so packets from the 10.88.0.0/16 bridge subnet can never reach the DNS servers. The inner containers' /etc/resolv.conf had the correct nameservers written but DNS queries timed out because they could not be routed. The previous chmod 644 and dns_servers fixes correctly configured the outer executor container but did not propagate into the nested build-step containers. Fixes applied: 1. Add netns = "host" to /etc/containers/containers.conf. When the file is bind-mounted into the executor container (see redhat-developer#3), inner Podman reads this and creates all build-step containers sharing the outer container's network namespace instead of a new bridge. They inherit the working resolv.conf with no NAT required. 2. Guarantee /etc/containers/containers.conf exists before runner registration even when DNS detection finds no upstream servers, so the bind mount in redhat-developer#3 always has a real file to attach to. 3. Add --docker-volumes to the runner registration so that every executor container gets the host's containers.conf bind-mounted at /etc/containers/containers.conf:ro. This makes the nested inner Podman pick up both dns_servers and netns = "host" automatically, without any change to the CI job Dockerfile or podman build invocation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n build Two problems with inner podman build containers inside the CI executor: 1. DNS failed because the executor container's resolv.conf points to a loopback stub (127.0.0.53/systemd-resolved) that is unreachable from inside nested containers. Fixed by detecting real upstream DNS servers via resolvectl/nmcli/resolv.conf and writing them to containers.conf dns_servers so Podman passes working nameservers to inner containers. 2. netns="host" was added as a workaround to give inner containers a routable network stack, but it causes them to share the actual host network namespace — so concurrent CI jobs compete for the same ports (e.g. nginx: bind() to 0.0.0.0:8080 failed: Address already in use). Replaced with sysctl ip_forward=1 so Netavark's bridge NAT works correctly, giving each inner container an isolated namespace while still routing egress through the executor container's interface. The containers.conf is bind-mounted into every executor container via --docker-volumes so Podman inside the CI job inherits dns_servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…config The mount-data-home.sh script looped forever waiting for a TYPE=mpath block device to appear, with no exit condition and TimeoutStartSec=0 in the systemd unit. On larger LPAR sizes (32+ cores / 192+ GB), IBM Cloud takes longer to establish Fibre Channel paths, so the device can take many minutes to appear. The infinite loop blocked all subsequent runcmd steps (restorecon, runner installation) indefinitely. Two changes: - Explicitly start multipathd before entering the loop so the daemon is guaranteed to be running when we call 'multipathd reconfigure'. - Cap the loop at 60 iterations (~10 minutes). If no multipath device appears in that window, proceed without mounting the data volume so the runner installation can still complete. The instance remains functional on its root filesystem; the data volume can be mounted manually once the FC paths are established. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…urces Adds TagsAsStringArray helper to convert map[string]string (key=value from --tags flag) to the key:value string format expected by IBM Cloud user tags. Threads tags through the network module and both action packages so every taggable resource gets the user-supplied tags: PiVolume, PiInstance (PiUserTags) on PowerVS; IsInstance, IsSshKey, IsVpc, IsSubnet, IsPublicGateway, IsSecurityGroup, IsFloatingIp, and ResourceGroup (Tags) on VPC/IBM Z paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-dns at registration Two improvements to make DNS reliable on all platforms: 1. Add /run/systemd/resolve/resolv.conf as a DNS detection fallback. On Ubuntu 24.04 with systemd-resolved, /etc/resolv.conf points to the 127.0.0.53 stub which is filtered out. The real upstream servers live in /run/systemd/resolve/resolv.conf and are needed for the fix to work. 2. Pass detected DNS servers as --docker-dns when registering the runner. containers.conf dns_servers fixes DNS for nested podman build containers, but Podman's Docker socket API does not always honour it when creating the executor container itself. Passing --docker-dns explicitly in config.toml ensures every job container gets working DNS regardless. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three fixes for persistent DNS failures inside CI executor containers: 1. Public DNS fallback: if resolvectl/nmcli/resolv.conf detect nothing (e.g. on non-standard cloud images), fall back to 8.8.8.8/8.8.4.4. The machine has internet access (it talks to GitLab), so public DNS works. 2. Explicit iptables MASQUERADE for 10.88.0.0/16: on RHEL 9, 'podman system reset --force' disrupts Netavark's firewalld integration. Netavark normally adds the MASQUERADE rule when the first container is created, but the runner resolves DNS at job startup — before any container has ever run. Adding the rule explicitly ensures it is in place before the first CI job. 3. firewall-cmd --add-masquerade: belt-and-suspenders for RHEL/firewalld systems. Enables masquerade at the zone level in addition to the iptables rule, so forwarded container traffic is NATted regardless of how firewalld manages its nftables entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…permanent The sysctl ip_forward settings and firewall-cmd masquerade rule added in the previous fix were transient: sysctl -w is not preserved across reboots, and firewall-cmd without --permanent is lost on firewalld restart. Persist ip_forward via /etc/sysctl.d/99-podman-ip-forward.conf so Podman bridge NAT works correctly after any reboot. Add --permanent to firewall-cmd --add-masquerade and follow it with --reload so the rule is immediately active AND survives across reboots and firewalld restarts on RHEL/Fedora (ppc64le) machines. Non-firewalld systems (Ubuntu/s390x) are unaffected: firewall-cmd fails silently via || true and the iptables POSTROUTING rule covers them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rk subnet Root cause (confirmed live on s390x runner): The host's Podman CNI bridge uses 10.88.0.0/16. When gitlab-runner creates an executor container (on the host's 10.88.0.0/16 bridge, e.g. at 10.88.0.22), and the CI job runs 'podman run' inside that executor, inner Podman (Netavark) creates its own bridge — also at 10.88.0.0/16 — inside the executor's network namespace. This produces two competing kernel routes inside the executor: 10.88.0.0/16 dev eth0 src 10.88.0.22 (executor → host) 10.88.0.0/16 dev cni-podman0 src 10.88.0.1 (Netavark bridge) Nested containers land on Netavark's bridge (e.g. at 10.88.0.23). The duplicate route to 10.88.0.0/16 breaks forwarding: DNS queries from inner containers to IBM Cloud resolvers (10.130.98.x) have no unambiguous egress path, causing socket.gaierror [Errno -2]. Note: the executor container's own DNS works fine because it reaches IBM DNS directly through the host bridge, without going through Netavark. Only nested 'podman run' containers are affected. Fix: Add [network] default_subnet = "192.168.100.0/24" to /etc/containers/containers.conf, which is bind-mounted into every executor container via --docker-volumes. Netavark inside the executor reads this and creates its bridge on 192.168.100.0/24, leaving 10.88.0.0/16 exclusively for the executor's eth0 (host connection). Forwarding and MASQUERADE then work correctly, giving nested containers working DNS and internet access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

pkg/provider/ibmcloud/ibmcloud.go (1)

227-231: ⚡ Quick win

Stabilize tag serialization order to avoid nondeterministic resource diffs.

At Line 229, iterating a Go map directly makes pulumi.StringArray order nondeterministic across runs. Since this helper feeds many IBM Cloud resources, this can cause noisy previews/updates if ordering is treated as significant. Sort keys before append.

Proposed fix

+import "sort"
...
 func TagsAsStringArray(tags map[string]string) pulumi.StringArray {
 	result := make(pulumi.StringArray, 0, len(tags))
-	for k, v := range tags {
+	keys := make([]string, 0, len(tags))
+	for k := range tags {
+		keys = append(keys, k)
+	}
+	sort.Strings(keys)
+	for _, k := range keys {
+		v := tags[k]
 		result = append(result, pulumi.String(fmt.Sprintf("%s:%s", k, v)))
 	}
 	return result
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/ibmcloud/ibmcloud.go` around lines 227 - 231, The
TagsAsStringArray function iterates directly over the tags map, which produces
nondeterministic ordering in Go since map iteration order is randomized. To fix
this, extract all the keys from the tags map into a slice, sort that slice of
keys, and then iterate over the sorted keys in the for loop instead of directly
iterating over the map. This ensures the resulting pulumi.StringArray has
consistent ordering across multiple runs, eliminating noisy diffs in resource
previews and updates.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 84-89: The grep check on line 84 searches for dns_options globally
in the file without scoping it to the [containers] section, and the subsequent
sed replacement on line 85 also operates globally. This can cause unintended
modifications to dns_options in other sections or leave [containers].dns_options
unset. Modify the grep pattern to specifically check for dns_options within the
[containers] section (for example, by using a pattern that matches dns_options
after the [containers] header), and update the sed command on line 85 to scope
the replacement to only the [containers] section to ensure consistent Podman DNS
configuration.

In `@pkg/provider/ibmcloud/services/power/power.go`:
- Around line 84-96: The loop that calls vc.Get(volumeId) currently retries on
all errors for the full duration, but should fail fast on permanent failures.
Modify the error handling block (the else clause where err is not nil) to
inspect the underlying power-go-client error for its HTTP status code. If the
error is a non-retriable 4xx response, return the error immediately to fail
fast. If the error is transient (5xx or 429), continue with the existing retry
logic that sleeps and retries. This same fix should be applied to the similar
error handling pattern in the section that also handles vc.Get errors (lines
97-104 and beyond).

---

Nitpick comments:
In `@pkg/provider/ibmcloud/ibmcloud.go`:
- Around line 227-231: The TagsAsStringArray function iterates directly over the
tags map, which produces nondeterministic ordering in Go since map iteration
order is randomized. To fix this, extract all the keys from the tags map into a
slice, sort that slice of keys, and then iterate over the sorted keys in the for
loop instead of directly iterating over the map. This ensures the resulting
pulumi.StringArray has consistent ordering across multiple runs, eliminating
noisy diffs in resource previews and updates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 996c0259-0ea3-4fe6-9953-7253897ca76e

📥 Commits

Reviewing files that changed from the base of the PR and between b30e0ee and c22d461.

📒 Files selected for processing (11)

pkg/integrations/gitlab/glrunner.go
pkg/integrations/gitlab/snippet-linux.sh
pkg/integrations/gitlab/types.go
pkg/integrations/integrations.go
pkg/integrations/otelcol/snippet-linux.sh
pkg/provider/ibmcloud/action/ibm-power/cloud-config
pkg/provider/ibmcloud/action/ibm-power/ibm-power.go
pkg/provider/ibmcloud/action/ibm-z/ibm-z.go
pkg/provider/ibmcloud/ibmcloud.go
pkg/provider/ibmcloud/modules/network/network.go
pkg/provider/ibmcloud/services/power/power.go

🚧 Files skipped from review as they are similar to previous changes (4)

pkg/integrations/integrations.go
pkg/integrations/gitlab/types.go
pkg/integrations/gitlab/glrunner.go
pkg/integrations/otelcol/snippet-linux.sh

coderabbitai · 2026-06-19T13:56:52Z

+        if grep -q '^dns_options' /etc/containers/containers.conf; then
+            sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
+                /etc/containers/containers.conf
+        else
+            sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
+                /etc/containers/containers.conf


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope dns_options edits to the [containers] section.

Line 84 checks dns_options globally, and Line 85 replaces it globally. If another section defines dns_options, [containers].dns_options may remain unset (or unrelated sections get modified), causing inconsistent Podman DNS behavior.

Proposed fix

- # Add or update dns_options within [containers] - if grep -q '^dns_options' /etc/containers/containers.conf; then - sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \ - /etc/containers/containers.conf + # Add or update dns_options within [containers] + if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \ + /etc/containers/containers.conf; then + sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \ + /etc/containers/containers.conf else sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \ /etc/containers/containers.conf fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if grep -q '^dns_options' /etc/containers/containers.conf; then

sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \

/etc/containers/containers.conf

else

sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \

/etc/containers/containers.conf

# Add or update dns_options within [containers]

if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \

/etc/containers/containers.conf; then

sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \

/etc/containers/containers.conf

else

sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \

/etc/containers/containers.conf

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/integrations/gitlab/snippet-linux.sh` around lines 84 - 89, The grep check on line 84 searches for dns_options globally in the file without scoping it to the [containers] section, and the subsequent sed replacement on line 85 also operates globally. This can cause unintended modifications to dns_options in other sections or leave [containers].dns_options unset. Modify the grep pattern to specifically check for dns_options within the [containers] section (for example, by using a pattern that matches dns_options after the [containers] header), and update the sed command on line 85 to scope the replacement to only the [containers] section to ensure consistent Podman DNS configuration.

coderabbitai · 2026-06-19T13:56:53Z

+	for i := 0; i < 60; i++ { // up to ~10 minutes
+		vol, err := vc.Get(volumeId)
+		if err == nil {
+			switch vol.State {
+			case "available", "in-use":
+				logging.Infof("volume %s is in state %q, proceeding with attachment", volumeId, vol.State)
+				return volumeId, nil
+			default:
+				logging.Infof("volume %s state: %q, retrying in 10s...", volumeId, vol.State)
+			}
+		} else {
+			logging.Infof("volume %s GET returned error (retrying in 10s): %v", volumeId, err)
+		}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in the file cat -n pkg/provider/ibmcloud/services/power/power.go | sed -n '79,104p'

Repository: redhat-developer/mapt

Length of output: 1114

🏁 Script executed:

# Let's also check the imports and understand what vc (volumeClient) is head -50 pkg/provider/ibmcloud/services/power/power.go

Repository: redhat-developer/mapt

Length of output: 1428

🏁 Script executed:

# Search for volumeClient definition and initialization rg -n "volumeClient\|vc\.Get\|IBMPIVolumeClient" pkg/provider/ibmcloud/services/power/power.go -A 3 -B 1

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Check if there are any error type definitions or status codes being handled rg -n "error\|Error\|status" pkg/provider/ibmcloud/services/power/power.go | head -30

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Let's find the dependency on power-go-client rg "github.com/IBM-Cloud/power-go-client" go.mod

Repository: redhat-developer/mapt

Length of output: 111

🏁 Script executed:

# Find the volumeClient function definition rg -n "func volumeClient" pkg/provider/ibmcloud/services/power/power.go -A 15

Repository: redhat-developer/mapt

Length of output: 689

🏁 Script executed:

# Check for any error handling patterns in the file rg -n "http\|status\|StatusCode\|Type\|errors\." pkg/provider/ibmcloud/services/power/power.go -i

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Search for similar retry patterns in other files for context rg -n "for.*i.*<.*60\|time\.After.*10" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Look for error classification/handling in other power-related files find pkg/provider/ibmcloud -name "*.go" -type f | head -10

Repository: redhat-developer/mapt

Length of output: 520

🏁 Script executed:

# Check if there's any existing error classification in the codebase rg -n "IsRetryable\|Retryable\|Transient\|Permanent" pkg/provider/ibmcloud/ -i

Repository: redhat-developer/mapt

Length of output: 47

🌐 Web query:

power-go-client IBMPIVolumeClient error handling retriable

💡 Result:

In IBM-Cloud/power-go-client, errors returned by the lower-level API calls (including those from the IBMPIVolumeClient volume-related methods) are wrapped via ibmpisession.SDKFailWithAPIError when the SDK call returns a non-nil error (example shown in ibm-pi-instance.go: it returns SDKFailWithAPIError(err, …) on err!= nil). [1] What this means for “retriable” handling: 1) power-go-client itself does not expose (in the reviewed code) a per-method “IsRetriable” hook specifically for IBMPIVolumeClient errors; instead, retry behavior is governed by the shared IBM Go SDK HTTP/retry mechanisms (ibm-cloud-sdk-common / ibm/go-sdk-core) and its generic error handling. The project is built on IBM SDK core and supports “automatic retries” as described in IBM’s SDK common documentation. [2] 2) Therefore, whether a particular IBMPIVolumeClient error is retried depends on how the IBM SDK core classifies the underlying HTTP/API error as retryable (e.g., throttling/rate-limit, transient network/5xx patterns), not on any IBMPIVolumeClient-specific “retriable error” implementation. Practical guidance (how to handle retriable errors in your code): - Enable/adjust SDK automatic retries at the IBM SDK core/client level (documented as “Automatic retries” in ibm-cloud-sdk-common). [2] - If you need explicit application-level retry decisions for power-go-client errors, you will need to inspect the error type/content you get from SDKFailWithAPIError (e.g., underlying HTTP status / error payload) and implement your own retry policy in your caller. If you can share the exact IBMPIVolumeClient method name and the error you’re seeing (HTTP status code, message), I can pinpoint whether that specific class of failure is typically treated as retryable by the IBM SDK core layer and give a concrete snippet for retry decision logic.

Citations:

1: https://github.com/IBM-Cloud/power-go-client/blob/master/clients/instance/ibm-pi-instance.go

2: https://github.com/IBM/ibm-cloud-sdk-common/blob/master/README.md

🌐 Web query:

IBM Cloud Power API error codes status codes

💡 Result:

IBM Cloud Power (PowerVS/Power Cloud) API “error codes / status codes” are primarily the standard HTTP status codes returned by the API, plus a service-specific error payload (for Power Cloud endpoints you’ll commonly see values like pcloudPvminstancesActionPostBadRequest with a description/message). IBM’s API handbook states that HTTP responses use standard 2xx/4xx/5xx categories, and when a 4xx/5xx status code is returned the response body uses an error container model. [1][2][3] 1) HTTP status codes (what you should key off of) - 4xx client errors: these indicate the request is invalid or cannot be fulfilled due to client-side issues; IBM documents the general meaning of 4xx codes and notes common ones like 404/405/409/etc. [1] - 5xx server errors: these indicate server-side failure; IBM documents general meaning of 5xx codes like 500/504/505/etc. [1] - IBM specifically says that if the service returns a 400-series or 500-series status, the response body must be an error container model. [2][3] 2) Service-specific PowerVS/Power Cloud error codes you’ll see in practice (examples) In PowerVS API calls under the /pcloud/... namespace, common failures surface as a combination of HTTP status + an error “title”/reason string (e.g., pcloudPvminstancesActionPostBadRequest) and a description message. - 400 Bad Request (example: action not allowed due to current state) - Starting a Power Virtual Server instance that is already in ACTIVE state can return HTTP 400 with a message like “is already in an ACTIVE state and cannot be started.” [4] - 400 Bad Request (example: invalid/required parameter missing) - Creating a PVM instance can return HTTP 400 where the error description mentions that a calculated parameter failed (for example, “storageType is required when not using a storage affinity policy”). [5] - 400 Bad Request (example: operation not supported in environment) - Updating a PVM instance can return HTTP 400 with “cloudInitialisation is not supported in this environment.” [6] - 409 Conflict (example: cannot attach volume when instance health/status not OK) - Volume attachment can return HTTP 409 conflict, with a message indicating the volume can’t be attached due to the instance health not being “OK” (e.g., health=WARNING). [7] - 409 Conflict (example: attach fails due to instance/VM state constraints) - The Terraform-provider issue for attachment shows an HTTP 409 conflict response for volume attachment to a PVM instance. [7] - 400 Bad Request (example: clone cancel only allowed in certain states) - Cancelling a volume clone request that is already in “completed” state returns HTTP 400, with a message indicating it must be in “prepared or available” to cancel. [8] 3) How to interpret “status_code” vs “error code” - IBM’s API handbook distinguishes HTTP status codes (what the HTTP layer returns) from the structured error payload that accompanies 4xx/5xx responses. The error container model includes an errors field with objects that include a code and message, and it may include status_code and trace. [2][9] - For Power Cloud/PowerVS, the examples you’ll encounter in client output show both: a) The HTTP status (400/409, etc.) b) A service-generated error key/name and a human-readable Description/Message. [8][4][5][7] Practical takeaway - If you’re mapping “Power API error codes” to a decision system, start with the HTTP status code (400 vs 409 vs 5xx) as the primary discriminator, then inspect the Power Cloud error payload fields (code/message/description) for the specific root cause (e.g., “already in ACTIVE”, “cannot cancel because state is completed”, “cloudInitialisation not supported”, “cannot attach because health not OK”). [1][2][4][8][7]

Citations:

1: https://cloud.ibm.com/docs/api-handbook?topic=api-handbook-status-codes

2: https://cloud.ibm.com/docs/api-handbook?topic=api-handbook-errors

3: https://github.com/ibm-cloud-docs/api-handbook/blob/086d28c9357b39612ceb816130d0f6ad92f82859/design/errors.md

4: PowerVS start and possibly stop actions failing when resource already in the desired state IBM-Cloud/terraform-provider-ibm#4400

5: PowerVS terraform deployment failing IBM-Cloud/terraform-provider-ibm#2787

6: High Priority: ibm_pi_instance fails to create instance. IBM-Cloud/terraform-provider-ibm#4905

7: Volume attachment to PVS Instances not working IBM-Cloud/terraform-provider-ibm#2810

8: https://community.ibm.com/community/user/blogs/amritanshu-verma/2026/06/11/ibm-power-virtual-server-volume-clone-lifecycle

9: https://cloud.ibm.com/docs/vpc?topic=vpc-instance-metadata-error-codes

🏁 Script executed:

# Check if there are examples or tests showing how errors are handled rg -n "Get\|error" pkg/provider/ibmcloud/action/ibm-power/ -A 2 -B 1 | head -40

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Look for any existing patterns of error classification in the codebase rg -n "switch.*err\|type.*error" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47

🏁 Script executed:

# Check go.mod for error handling related dependencies cat go.mod | grep -i "error\|retry\|backoff"

Repository: redhat-developer/mapt

Length of output: 321

Fail fast on non-retriable volume lookup failures.

Lines 84-96 currently retry all vc.Get(volumeId) errors for ~10 minutes. Permanent failures (e.g., invalid volume ID, permission errors, volume not found) should return immediately; otherwise this adds avoidable timeout delays and masks root causes during incident triage.

Classify errors by HTTP status code and retry only transient ones (5xx, 429), while failing fast on non-retriable 4xx responses. The underlying power-go-client error can be inspected for the HTTP status to make this determination.

Also applies to: 97-104

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/provider/ibmcloud/services/power/power.go` around lines 84 - 96, The loop that calls vc.Get(volumeId) currently retries on all errors for the full duration, but should fail fast on permanent failures. Modify the error handling block (the else clause where err is not nil) to inspect the underlying power-go-client error for its HTTP status code. If the error is a non-retriable 4xx response, return the error immediately to fail fast. If the error is transient (5xx or 429), continue with the existing retry logic that sleeps and retries. This same fix should be applied to the similar error handling pattern in the section that also handles vc.Get errors (lines 97-104 and beyond).

This comment was marked as outdated.

Sign in to view

adrianriobo force-pushed the fix-829 branch from 356055c to 2a1f1fc Compare June 10, 2026 06:19

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread pkg/integrations/gitlab/snippet-linux.sh Outdated

Comment thread pkg/integrations/gitlab/snippet-linux.sh

adrianriobo force-pushed the fix-829 branch from 3e5d1e7 to 8b6cb11 Compare June 10, 2026 11:24

This comment was marked as resolved.

Sign in to view

deekay2310 reviewed Jun 12, 2026

View reviewed changes

adrianriobo force-pushed the fix-829 branch from b85a602 to b9f6434 Compare June 12, 2026 04:43

adrianriobo force-pushed the fix-829 branch 2 times, most recently from 74eb149 to 16330a5 Compare June 18, 2026 15:40

adrianriobo force-pushed the fix-829 branch from 16330a5 to 08f1b26 Compare June 18, 2026 19:54

adrianriobo and others added 6 commits June 19, 2026 08:39

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ibmcloud): preserve path component in COS backend URL#830

fix(ibmcloud): preserve path component in COS backend URL#830
adrianriobo wants to merge 12 commits into
redhat-developer:mainfrom
adrianriobo:fix-829

adrianriobo commented Jun 9, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

deekay2310 left a comment

Uh oh!

deekay2310 Jun 12, 2026

Uh oh!

adrianriobo Jun 12, 2026

Uh oh!

deekay2310 Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adrianriobo commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

deekay2310 left a comment

Choose a reason for hiding this comment

Uh oh!

deekay2310 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

adrianriobo Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

deekay2310 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adrianriobo commented Jun 9, 2026 •

edited

Loading