Add transient-error classification to APIFailure#93
Merged
Conversation
API.do_request now records the HTTP status code on every exception it raises (status_code attribute), and APIFailure gains is_transient_error(): True for gateway/connection-level failures (HTTP 408/502/503/504, dropped or reset connections, client-side timeouts) where retrying the same request may succeed, False for deterministic errors (400/401/403/404/429, wrapped unexpected errors). Classification is based on the recorded status code rather than exception class identity or message text, so it stays correct if a status code gains a dedicated subclass later. Motivated by SocketDev/socket-python-cli#232: the CLI retries transient full-scan upload failures and previously had to parse the status code out of catch-all APIFailure message text.
|
🚀 Preview package published! Install with: pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple socketdev==3.3.0.dev2 |
Martin Torp (mtorp)
added a commit
to SocketDev/socket-python-cli
that referenced
this pull request
Jun 10, 2026
Address review feedback on the upload retry: - The retry decision now delegates to APIFailure.is_transient_error() (socketdev>=3.3.0, SocketDev/socket-sdk-python#93), which classifies by the HTTP status code the SDK records when raising. The CLI no longer encodes the SDK's exception hierarchy or parses status codes out of message text, so SDK restructuring can't silently break the classification. - The backoff schedule is now the single source of truth for the loop: FULL_SCAN_UPLOAD_BACKOFF_SCHEDULE_SECONDS = (10.0, 30.0, None), where each entry is the wait before the next attempt and the final None re-raises instead of retrying. FULL_SCAN_UPLOAD_MAX_ATTEMPTS is computed from its length. Note: uv.lock is intentionally not regenerated yet - socketdev 3.3.0 must be released to PyPI first (blocked on socket-sdk-python#93).
The class definitionally represents a 502, so there is no reason for construction sites to pass (or be able to override) the status.
Oskar Haarklou Veileborg (BarrensZeppelin)
approved these changes
Jun 10, 2026
Martin Torp (mtorp)
added a commit
to SocketDev/socket-python-cli
that referenced
this pull request
Jun 10, 2026
…onnections) (#232) * Retry transient full-scan upload failures (502/503/504/408, dropped connections) A full-scan upload can fail transiently at the gateway/connection level - an HTTP 502/503/504/408, a dropped or reset connection, or a client-side timeout - without the server having created the scan. The CLI previously made exactly one attempt, so an entire run (including a completed reachability analysis) died on a single transient failure even though a retried upload almost always succeeds. create_full_scan now retries the fullscans POST up to 3 total attempts with increasing waits (~10s, then ~30s, plus jitter) on transient failures only: APIBadGateway (502), APIConnectionError, APITimeout, and catch-all APIFailure whose embedded original_status_code is 408/503/504. Dedicated 4xx classes, catch-all 400s, and error payloads are never retried. In these failure modes the server never finished reading the request body, so no scan was created and a retry does not duplicate one; in the rare case where a gateway timeout races a request the server later completes, the extra scan is benign and superseded by the retry (as if the CLI had run twice). The retry loop lives inside the existing try/finally so the brotli-compressed .socket.facts.json.br temp files survive until every attempt has finished; fullscans.post rebuilds its lazy file loaders from the plain paths on every call, so re-invoking it per attempt is safe. Assisted-by: Claude Code:claude-opus-4-8 * docs: drop the 'retry almost always succeeds' claim from retry comments * Move transient-error classification into the SDK; simplify retry loop Address review feedback on the upload retry: - The retry decision now delegates to APIFailure.is_transient_error() (socketdev>=3.3.0, SocketDev/socket-sdk-python#93), which classifies by the HTTP status code the SDK records when raising. The CLI no longer encodes the SDK's exception hierarchy or parses status codes out of message text, so SDK restructuring can't silently break the classification. - The backoff schedule is now the single source of truth for the loop: FULL_SCAN_UPLOAD_BACKOFF_SCHEDULE_SECONDS = (10.0, 30.0, None), where each entry is the wait before the next attempt and the final None re-raises instead of retrying. FULL_SCAN_UPLOAD_MAX_ATTEMPTS is computed from its length. Note: uv.lock is intentionally not regenerated yet - socketdev 3.3.0 must be released to PyPI first (blocked on socket-sdk-python#93). * Lock socketdev 3.3.0 socketdev 3.3.0 is now released, unblocking the >=3.3.0 floor bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Consumers that want to retry transient API failures have no reliable way to classify an exception. The catch-all
APIFailureonly embeds the HTTP status in its message text (original_status_code:<code>), so callers must parse message strings and special-case the exception hierarchy — which breaks silently if a status code gains a dedicated subclass later. SocketDev/socket-python-cli#232 (full-scan upload retry) ran into exactly this, see review discussion.Change
API.do_requestnow records the HTTP status code on every exception it raises, via a new keyword-onlystatus_codeattribute onAPIFailure(defaultNone, so all existing construction/raise sites remain valid).APIFailure.is_transient_error()returns whether retrying the same request may succeed:APITimeout/APIConnectionError(no HTTP status — the request never completed).APIFailure()wrappers around unexpected errors.is_transient_error()keeps working unchanged.Testing
New
tests/unit/test_exceptions.py(21 tests): classification truth table for direct construction, status codes attached bydo_requestper HTTP status (401/403/404/429/502 dedicated classes; 408/503/504 and 400/500 catch-all), timeout/connection-error mapping, and the wrapped-unexpected-error case. Full unit suite: 122 passed, 1 skipped.Version
3.2.1→3.3.0(new public API).