Skip to content

feature: retry requests with backoff on HTTP 429 responses #1124

Description

@dgarros

Component

Python SDK

Describe the Feature Request

When the Infrahub server responds with an HTTP 429 (Too Many Requests), the
SDK client should automatically retry the request instead of surfacing the
error immediately.

Two behaviours are needed:

  • Exponential backoff with jitter — on a 429, wait before retrying and
    progressively increase the wait on each subsequent failure, with a small
    random jitter added to avoid a thundering-herd of clients retrying in
    lockstep.
  • Respect the Retry-After header — when the 429 response includes a
    Retry-After header, honour the delay it specifies rather than the
    computed backoff. This is the server communicating its own recovery time.

Both the async (InfrahubClient) and sync (InfrahubClientSync) clients
should share this behaviour.

Describe the Use Case

Clients (including infrahubctl and the Infrahub Ansible collection, which
use this SDK internally) can generate bursts of requests — batching, syncs,
generators — that trip server-side rate limiting. Today a 429 bubbles up as
a failure, forcing every caller to implement their own retry logic. Handling
it in the SDK makes rate limiting transparent and lets the server reliably
shed and recover from load.

Additional Information

Suggested implementation notes (non-binding):

  • Cap the number of retries and the maximum backoff so requests eventually
    fail rather than hang indefinitely.
  • Consider making retry parameters configurable via the existing pydantic
    Config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions