Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 33 additions & 20 deletions documentation/components/bridges/symfony-telemetry-bundle.md
Original file line number Diff line number Diff line change
Expand Up @@ -813,24 +813,36 @@ exporters:

Inside `exporters.<name>.otlp.transport`. Required for the `otlp` sub-block.

#### Transport type

| `type` | Transport | Notes |
|--------------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| `curl` | `CurlTransport` | Synchronous HTTP (default). Each `send()` blocks until the response. |
| `async_curl` | `AsyncCurlTransport` | Non-blocking `curl_multi`. Auto-pumped on Messenger's `WorkerRunningEvent` when `messenger` instrumentation is enabled; otherwise pump `tick()` yourself. |
| `grpc` | `GrpcTransport` | OTLP/gRPC (Protobuf only). |
| `stream` | `StreamTransport` | JSONL to a file path or `php://` stream. |
| `service` | user service | Aliases an existing transport service id. |

Prefer `curl`. Use `async_curl` only for non-blocking dispatch — in a Messenger worker the bundle pumps it on
`WorkerRunningEvent`; elsewhere you must call `tick()` yourself. See the
[OTLP bridge async transport section](/documentation/components/bridges/telemetry-otlp-bridge.md#async-curl-transport).

#### Timeouts

The defaults assume the recommended deployment: an OpenTelemetry Collector running close to the application (loopback,
UDS, or sidecar):

| Setting | Default | Applies to | Bounds |
|-----------------------|---------------------:|------------|-------------------------------------------------------------|
| `timeout_ms` | 5000 curl / 250 grpc | curl, grpc | Per-request deadline (curl: total request; grpc: per-call) |
| `connect_timeout_ms` | 250 | curl only | TCP/TLS connect; gRPC has no separate bound |
| `shutdown_timeout_ms` | 5000 | curl, grpc | Wall-clock budget for draining pending requests at shutdown |
| Setting | Default | Applies to | Bounds |
|-----------------------|----------------------------------------:|------------------------|-------------------------------------------------------------|
| `timeout_ms` | 10000 curl / 5000 async_curl / 250 grpc | curl, async_curl, grpc | Per-request deadline (curl: total request; grpc: per-call) |
| `connect_timeout_ms` | 250 curl / 1500 async_curl | curl, async_curl | TCP/TLS connect; gRPC has no separate bound |
| `pump_timeout_ms` | 100 | async_curl only | Per-`tick()` bounded drive budget (`0` = single exec round) |
| `shutdown_timeout_ms` | 5000 | curl, async_curl, grpc | Wall-clock budget for draining pending requests at shutdown |

The curl transport is asynchronous and only advances while the host pumps it, so its `timeout_ms` must span the gap
between dispatch and the next pump — hence the **5000 ms** curl default, which comfortably covers a worker's loop
iteration (the bundle pumps in-flight curl requests on Messenger's `WorkerRunningEvent`, see
[Messenger instrumentation](#messenger)). gRPC calls progress in the background via the grpc core, so the gRPC
per-call deadline stays tight at **250 ms**. `shutdown_timeout_ms` is independent of `timeout_ms` and bounds graceful
drain at exit. For a remote collector across regions, raise `timeout_ms`. See the
[OTLP bridge Timeouts section](/documentation/components/bridges/telemetry-otlp-bridge.md#timeouts) for the rationale.
`curl` is synchronous, so its `timeout_ms` is a per-flush ceiling. `async_curl` uses a larger **1500 ms**
`connect_timeout_ms` (it only advances when pumped) and a **100 ms** `pump_timeout_ms` per `tick()`. gRPC progresses in
the background, so its per-call deadline stays at **250 ms**. `shutdown_timeout_ms` bounds graceful drain at exit. See
the [OTLP bridge Timeouts section](/documentation/components/bridges/telemetry-otlp-bridge.md#timeouts).

#### Failover Transport

Expand All @@ -856,8 +868,8 @@ exporters:

- The `failover:` block accepts the same fields as the parent transport, except it cannot itself declare a nested
`failover:` (single-level depth).
- Allowed only on `curl` and `grpc` primaries. `failover` under a `stream` or `service` primary is rejected at
config-validation time.
- Allowed only on `curl`, `async_curl` and `grpc` primaries. `failover` under a `stream` or `service` primary is
rejected at config-validation time.
- The bundle registers `flow.telemetry.exporter.<name>.failover.transport` for the failover service id.

For the underlying behavior — when a forwarded batch is treated as absorbed vs. lost, the shape of
Expand Down Expand Up @@ -1156,12 +1168,13 @@ Flushing per message means one exporter round-trip per message. For high-through
[`max_batch_age`](#batching) on the batching processor so the batch coalesces across messages and a single
idle worker still exports on a time bound rather than per message.

The async OTLP `curl` transport makes no network progress unless the process pumps it, and a worker blocks on its
queue poll between messages. The bundle therefore pumps every configured `curl` transport on Messenger's
`WorkerRunningEvent` (each loop iteration), so a request dispatched by the per-message flush completes in the
background instead of stalling until shutdown and tripping its wall-clock `timeout_ms`. This is wired automatically
when `messenger` instrumentation is enabled; it requires no configuration. Keep the curl `timeout_ms` comfortably
above the worker's poll/sleep cadence — the **5000 ms** default already does (see [Timeouts](#timeouts)).
The OTLP `curl` transport sends synchronously, so the per-message flush exports the batch and reports its outcome
before the handler returns — there is nothing to pump between messages and no request left in flight to stall until
shutdown. Each flush blocks up to the curl `timeout_ms`, so keep a Collector close to the worker (loopback/UDS/sidecar)
to keep that sub-millisecond (see [Timeouts](#timeouts)).

With the `async_curl` transport, the subscriber also pumps each transport's `tick()` on every `WorkerRunningEvent`, so
in-flight requests complete in the background.

#### Twig

Expand Down
92 changes: 49 additions & 43 deletions documentation/components/bridges/telemetry-otlp-bridge.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,34 +155,25 @@ The Protobuf serializer requires the `google/protobuf` package.
The production recommendation is to run an OpenTelemetry Collector close to the application (loopback, UDS, or
sidecar), so the roundtrip is sub-millisecond and a stuck collector never freezes your PHP process at shutdown.

The curl transport is asynchronous (`curl_multi`): it only makes network progress while the host process pumps it
(`send()`, `shutdown()`, or `tick()`). Its per-request `timeout_ms` is wall-clock from dispatch, so the budget must
span the gap between dispatching a request and the next pump. In a long-running worker that gap is a whole loop
iteration, which is why curl `timeout_ms` defaults to **5000 ms**. gRPC calls progress in the background via the grpc
core with no host pumping, so the gRPC per-call deadline stays tight at **250 ms**.

| Transport | Setting | Default | Unit | Bounds |
|-----------|-------------------------|--------:|--------------|-------------------------------------------------------------|
| Curl | `withTimeout()` | 5000 | milliseconds | Per-request: connect + send + receive |
| Curl | `withConnectTimeout()` | 250 | milliseconds | TCP/TLS connection establishment only |
| Curl | `withShutdownTimeout()` | 5000 | milliseconds | Wall-clock budget for draining pending requests at shutdown |
| gRPC | `timeoutMs` | 250 | milliseconds | Per-call deadline (no separate connect bound) |
| gRPC | `shutdownTimeoutMs` | 5000 | milliseconds | Wall-clock budget for draining pending calls at shutdown |

`timeout_ms` is the per-request deadline. `shutdown_timeout_ms` is a separate wall-clock budget enforced only when
draining pending requests during `shutdown()` — it bounds graceful exit independently of `timeout_ms`. Pending
requests still in flight after the shutdown deadline are abandoned and reported as failed: forwarded to the failover
transport if one is configured, otherwise surfaced to the curl transport's `ErrorHandler` (default `ErrorLogHandler`).

Failed exports are surfaced to that `ErrorHandler` **as they are reaped** — on `send()`, `tick()`, or `shutdown()` —
and never retained, so a long-running host does not accumulate failures. Pass a custom handler as the fifth
`CurlTransport` constructor argument (or the `errorHandler:` argument of `otlp_curl_transport()`); the Symfony bundle
injects the exporter's configured `error_handler` automatically.

For a remote collector across regions or a managed SaaS endpoint, 5000–10000 ms is reasonable for both transports.
Long-running hosts that idle between dispatches (e.g. a Symfony Messenger worker) should call `tick()` periodically so
in-flight curl requests complete in the background instead of stalling until shutdown — see
[Long-running workers](#long-running-workers-tick).
The curl transport is **synchronous**: each `send()` blocks up to `timeout_ms`, then returns or throws. Keeping export
off the hot path is the batching processor's job — it flushes only every `batch_size` signals (or on age / flush /
shutdown). gRPC progresses in the background, so its per-call deadline stays at **250 ms**.

| Transport | Setting | Default | Unit | Bounds |
|-------------|-------------------------|--------:|--------------|------------------------------------------------------------------|
| Curl (sync) | `withTimeout()` | 10000 | milliseconds | Per-request: connect + send + receive (max time `send()` blocks) |
| Curl (sync) | `withConnectTimeout()` | 250 | milliseconds | TCP/TLS connection establishment only |
| Curl (sync) | `withShutdownTimeout()` | 5000 | milliseconds | Reserved for the failover drain budget at shutdown |
| Async curl | `withTimeout()` | 5000 | milliseconds | Per-request wall-clock; must span the gap between pumps |
| Async curl | `withConnectTimeout()` | 1500 | milliseconds | TCP/TLS connect; larger default tolerates infrequent pumping |
| Async curl | `withPumpTimeout()` | 100 | milliseconds | Per-`tick()` bounded drive budget (`0` = single exec round) |
| Async curl | `withShutdownTimeout()` | 5000 | milliseconds | Wall-clock budget for draining pending requests at shutdown |
| gRPC | `timeoutMs` | 250 | milliseconds | Per-call deadline (no separate connect bound) |
| gRPC | `shutdownTimeoutMs` | 5000 | milliseconds | Wall-clock budget for draining pending calls at shutdown |

Against a local collector each send is sub-millisecond, so the defaults are only ceilings. On failure `send()` throws
synchronously — `TransportException`, or `FailoverTransportException` once the batch is forwarded to the failover. Async
curl is tuned differently — see [Asynchronous transport](#async-curl-transport).

```php
<?php
Expand Down Expand Up @@ -212,27 +203,42 @@ $remoteGrpc = otlp_grpc_transport(
> earlier versions could not express tight, realistic deadlines. A negative value to `withTimeout()` /
> `withConnectTimeout()` raises `\InvalidArgumentException`.

## Long-running workers (`tick()`) {#long-running-workers-tick}
## Long-running workers {#long-running-workers}

`curl_multi` has no background thread — an in-flight request only advances while the host pumps the multi-handle.
Short-lived processes (an HTTP request, a one-shot CLI command) are fine: they end right after dispatch and
`shutdown()` drains synchronously. A long-running worker is not: it dispatches telemetry after handling a message and
then blocks on its queue poll waiting for the next one. During that idle gap nothing pumps the request, yet
`CURLOPT_TIMEOUT_MS` keeps counting wall-clock — so the request eventually dies with `curl error 28 (Timeout was
reached)` and the telemetry is lost.
The synchronous curl transport completes each request before `send()` returns, so nothing ages out between messages.
Keeping export off the worker's hot path is the **batching processor** — it flushes per `batch_size` signals or age
limit. In a Symfony Messenger worker the bundle flushes after each message and on stop (see the
[Symfony telemetry bundle](/documentation/components/bridges/symfony-telemetry-bundle.md)); keep a local Collector so
each flush stays sub-millisecond. For non-blocking dispatch, see [Asynchronous transport](#async-curl-transport).

`CurlTransport::tick()` fixes this. It is non-blocking: it advances all currently-possible I/O and reaps finished
handles, but never waits. Call it periodically from the host's idle loop so dispatched requests complete in the
background:
## Asynchronous transport (`AsyncCurlTransport`) {#async-curl-transport}

`AsyncCurlTransport` uses `curl_multi` for non-blocking I/O: `send()` queues the request and returns without waiting.
Opt in with `otlp_async_curl_transport()` or the bundle's `transport.type: 'async_curl'`.

A queued request only advances while the host pumps the handle — on the next `send()`, `shutdown()`, or `tick()`.
`tick()` is bounded and select-driven: it drives pending requests for up to `pump_timeout_ms` (default **100 ms**), so a
local backend completes within a single tick. `0` falls back to one non-blocking exec round (rarely enough — prefer the
default).

```php
// once per worker loop iteration
$curlTransport->tick();
use function Flow\Bridge\Telemetry\OTLP\DSL\otlp_async_curl_transport;

$transport = otlp_async_curl_transport('http://localhost:4318');

// Pump once per worker loop so in-flight requests complete:
$transport->tick();
```

`tick()` is a no-op when nothing is pending or after `shutdown()`, so calling it on every iteration is cheap. The
Symfony bundle wires this automatically on Messenger's `WorkerRunningEvent` — see the
[Symfony telemetry bundle](/documentation/components/bridges/symfony-telemetry-bundle.md).
In a Symfony Messenger worker with `messenger` instrumentation enabled, the bundle pumps every `async_curl` transport on
`WorkerRunningEvent` (~1s when idle) — the cadence the **1500 ms** `connect_timeout_ms` default is sized for. Elsewhere
you must call `tick()` yourself.

Failures surface on a later `send()`/`tick()`/`shutdown()` (no caller on the stack), so they go to the failover
transport or, without one, the injected `ErrorHandler` (5th constructor arg, default `ErrorLogHandler`) — which is why
the async transport takes an error handler and the synchronous one just throws.

Prefer the synchronous `curl` transport unless you need non-blocking dispatch and can guarantee a pump cadence.

## Failover Transport

Expand Down
Loading
Loading