proxy_h2: acknowledge downstream RST_STREAM while the upstream request-body write is blocked by molocule · Pull Request #911 · cloudflare/pingora

molocule · 2026-06-13T02:00:23Z

This PR closes #910

Problem

In bidirection_down_to_up, the downstream-read select arm awaits send_body_to2() inside the arm body. While that write is parked on upstream flow control, the downstream RecvStream is not polled, so a downstream RST_STREAM is never observed.

The task keeps holding the downstream stream handles, and since the h2 crate only returns a reset stream's connection-window credit once all handles drop (release_closed_capacity at ref_count == 0), the cancelled stream pins its share of the downstream connection window for as long as the upstream write stays blocked (potentially indefinitely, because there is no default write timeout).

This is dangerous because cancelling a stream immediately frees its stream slot (max_concurrent_streams) on both ends, so the connection appears to have plenty of capacity and the client (e.g. Envoy or any gRPC client) keeps multiplexing new requests onto it. But the shared connection send window does not come back: credit the client spent can only be restored by the receiver's connection-level WINDOW_UPDATEs, and pingora never sends them because the cancelled streams' unconsumed credit stays pinned to their zombie streams. Flow-control accounting is correct, but clients will not be able to send data upstream.

Changes

pingora-core: add Session::watch_h2_stream_reset() to the HTTP server session enum. For H2 it resolves when the client resets the stream, reusing the existing Idle/poll_reset future; for H1/other protocols it is pending forever (there is no out-of-band abort signal — detecting an H1 close would require destructively reading the socket).
proxy_h2::send_body_to2: race write_body against watch_h2_stream_reset(). A reset surfaces as an H2Error with ErrorSource::Downstream.
bidirection_down_to_up: on a downstream-sourced error from send_body_to2, fail so the stream handles drop and the window credit is reclaimed immediately, except when a cache fill is in progress, in which case the downstream error is swallowed and the upstream response continues to be admitted to cache, mirroring the existing policy for downstream read errors.
proxy_down_to_up: on downstream-sourced errors, send RST_STREAM CANCEL on the upstream stream so the upstream peer also releases its stream resources promptly (previously only done for upstream read timeouts).

Notes

The race is placed around the raw write_body (after the request-body filters) rather than around all of send_body_to2 or at the call site. This is because it is the only spot where the borrows are disjoint (&mut SendStream vs. &mut Session), and it means a reset only ever cancels an h2 frame write on a stream that is being torn down, it never cancels user-defined body filters mid-await, which were never required to be cancel-safe.

Testing

Two integration tests (plus an h2c listener for the cache test service on :6154, since raw h2 frame control is needed):

test_h2_downstream_rst_while_upstream_write_blocked: upstream never grants window updates; client fills the upstream stream window until the proxy parks, then sends RST_STREAM. Asserts the upstream stream is cancelled (RST_STREAM CANCEL received) within a bounded time. Hangs without this fix.
test_h2_downstream_rst_during_cache_fill: same blocked-write reset, but with a cacheable response mid-admission; the upstream withholds the response tail until after the reset. Asserts the fill still completes: a follow-up request is a cache hit with the full body.

andrewhavck · 2026-06-19T23:20:21Z

                    client_body.send_reset(h2::Reason::CANCEL);
                    // Mark the underlying H2 connection for shutdown so it's not used
                    // for new streams in case it is hung.
                    client_session.conn.mark_shutdown();


I don't think we want to mark the entire upstream connection for shutdown on a downstream error. Sending a CANCEL seems fine on the impacted stream.

andrewhavck · 2026-06-19T23:24:34Z

+                            }
+                            // ignore downstream error so that upstream can continue to write cache
+                            downstream_state.to_errored();
+                            warn!(


Let's use the same pattern of checking suppress_proxy_warn_log before emitting a warn here.

andrewhavck · 2026-06-20T00:07:52Z

+    /// For HTTP/2 this resolves when the client resets the stream (RST_STREAM) or the
+    /// stream errors. Other protocols have no out-of-band abort signal (detecting a
+    /// close would require consuming reads), so this future is pending forever for them.
+    pub async fn watch_h2_stream_reset(&mut self) -> Result<h2::Reason> {


I'd prefer us return the future here Option<Idle<'_>>, we then avoid the future and select for downstream HTTP/1.1 clients in proxy_h2. We also avoid the case where a caller mistakenly invokes this for something other than H2 outside of select and stays pending.

molocule added 6 commits June 10, 2026 16:19

add

922df3c

Update proxy_h2.rs

81be571

add unit test

2d8db2a

caching path

27aefe1

add test

6adf921

Merge branch 'main' into rst-stream-fix

c093a91

molocule mentioned this pull request Jun 13, 2026

H2→H2 proxy: downstream RST_STREAM is not observed while the upstream request-body write is blocked, pinning connection-window credit indefinitely #910

Open

Update test_basic.rs

6fb2e5c

andrewhavck self-assigned this Jun 19, 2026

andrewhavck reviewed Jun 19, 2026

View reviewed changes

andrewhavck reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy_h2: acknowledge downstream RST_STREAM while the upstream request-body write is blocked#911

proxy_h2: acknowledge downstream RST_STREAM while the upstream request-body write is blocked#911
molocule wants to merge 7 commits into
cloudflare:mainfrom
modal-labs:rst-stream-fix

molocule commented Jun 13, 2026 •

edited

Loading

Uh oh!

andrewhavck Jun 19, 2026

Uh oh!

andrewhavck Jun 19, 2026

Uh oh!

andrewhavck Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

molocule commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Notes

Testing

Uh oh!

andrewhavck Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

andrewhavck Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

andrewhavck Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

molocule commented Jun 13, 2026 •

edited

Loading