perf(durable-streams-rust): serve live-tail SSE from an epoll reactor (flat per-connection memory) by balegas · Pull Request #4662 · electric-sql/electric

balegas · 2026-06-29T14:10:30Z

Why

Durable Streams has to hold connections to millions of users across millions of streams. The cost that matters is therefore the per-connection memory held while a subscriber sits idle waiting for the next append — and it must be decoupled from the number of active connections (and from the number of streams), with a constant number of runtime tasks.

Before this change every live SSE subscriber was a parked async connection task. Even while idle, each one pins a full connection-state future (sized to the largest request handler) plus its read buffer for the lifetime of the stream. At fan-out / high-connection scale that parked future is the dominant resident cost, and it grows linearly with the number of connected subscribers — exactly the axis we need to keep flat.

Approach — an epoll reactor

Serve live-tail SSE from a fixed pool of N = available_parallelism() reactor threads, each owning one epoll instance + an eventfd + a generation-tracked slab of subscribers. A connection task that produces a live-tail SSE response hands its socket (and its connection-limiter permit) to a reactor and returns — freeing the task future entirely. A subscriber then costs only:

a compact slab entry (~tens of bytes), and
the kernel socket.

Resulting memory model:

tasks = O(cores) — constant, independent of streams and connections.
memory = O(streams)·per-stream + O(connections)·slab-entry, with the two axes decoupled: idle streams cost nothing extra (no reactor thread is even spawned until the first SSE subscriber registers), and a connection never carries per-stream-sized state.

Append → wakeup routing stays O(subscribers of that stream): publish_durable_tail walks only the stream's own subscriber list and signals the relevant shard eventfds — no global scans, and streams with no subscribers carry no list at all.

Scope & safety

Linux only (epoll). Non-Linux builds keep the existing inline hand-off path unchanged.
Only the live-tail case runs on the reactor (root stream, tiering off, start at/after the live file base). Cold catch-up / fork / tiered reads stay on the proven inline path.
Byte-identical SSE framing, shared with the inline path, so the wire output is exactly what the conformance suite already validates.
Correctness: level-triggered EPOLLOUT armed only while backpressured; EAGAIN/partial-write handling; slab generation guard against ABA on reused slots; range reads taken under one consistent (file, file_base) snapshot so compaction can't tear them; the connection permit travels with the subscriber, so the connection stays counted and graceful drain still works; 15s keepalive + 60s lifetime cap match the inline path.

Results (local)

Per-subscriber resident memory: ~7.3 → ~0.64 KiB/sub (~11×) — controlled cgroup harness, server-only RSS, 0→1000 subscribers, identical build/config. 1000 live subscribers now add ~0.6 MB total instead of ~7 MB; the curve is essentially flat.
Conformance: clean — the full SSE suite passes. The only failures are 3 pre-existing long-poll timing flakes that the base branch fails identically.

Validation on GKE

Confirmed on a real cluster (c4d-standard-16-lssd server, 4-CPU limit; ds-bench SSE fan-out, 1 stream, subscriber sweep) — modified vs the prior server, pod working-set memory / delivery p99 at 1000 subscribers:

config	pod mem peak/p50 (old → new)	p99 (old → new)
wal (cache off)	27/23 → 22/18 MB	5.48 → 5.17 ms
wal (cache on)	26/21 → 15/14 MB	5.21 → 4.20 ms

So on real hardware the reactor cuts SSE fan-out pod memory by ~22% (cache off) / ~33% (cache on) at 1000 subscribers, with equal-or-better delivery latency and unchanged throughput (~75–80k ev/s) — matching the local per-subscriber slope above. Conformance suite green.

Design doc: docs/superpowers/specs/2026-06-29-sse-reactor-flat-userspace-design.md.

Stacked on #4661 (the inline Body::Sse hand-off). Base is set to that branch so this diff is reactor-only; review/merge #4661 first.

🤖 Generated with Claude Code

Hand each live-tail SSE subscriber from its connection task to a fixed pool of N=available_parallelism() epoll reactor threads, each owning a generation-tracked slab. Per-subscriber resident memory collapses from a parked connection-task future to a compact slab entry, so it stops scaling with the number of active connections. Linux only; non-Linux keeps the existing inline hand-off path, and cold catch-up stays on it too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01A8Pz3PafV7mTwWmwv545Rh

Close subscriber sockets still queued in the intake at shutdown (and reject registrations after shutdown begins), so neither the fd nor the connection-limiter permit leaks — a held permit made drain() wait out its full grace period. Also handle write()==0 by closing the peer instead of reading a stale errno (which risked a spurious EAGAIN re-arm / EINTR spin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01A8Pz3PafV7mTwWmwv545Rh

…-06-30 report Update the Benchmarks section to the current reactor build: write peak 860k → ~928k append/s, add the SSE live-tail reactor results (p99 ~0.5–2.5 ms across 64–2048 connections, ~27 MiB shared fan-out for 1000 subscribers), and point to results-2026-06-30/REPORT.md in ds-bench for the full matrix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01A8Pz3PafV7mTwWmwv545Rh

balegas and others added 2 commits June 30, 2026 00:08

balegas force-pushed the sse-reactor-flat-userspace branch from 98dd213 to 3754d64 Compare June 29, 2026 23:09

msfstef approved these changes Jun 30, 2026

View reviewed changes

balegas merged commit 4ca377a into sse-fanout-per-subscriber-memory Jun 30, 2026
14 checks passed

balegas deleted the sse-reactor-flat-userspace branch June 30, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(durable-streams-rust): serve live-tail SSE from an epoll reactor (flat per-connection memory)#4662

perf(durable-streams-rust): serve live-tail SSE from an epoll reactor (flat per-connection memory)#4662
balegas merged 3 commits into
sse-fanout-per-subscriber-memoryfrom
sse-reactor-flat-userspace

balegas commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

balegas commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Approach — an epoll reactor

Scope & safety

Results (local)

Validation on GKE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

balegas commented Jun 29, 2026 •

edited

Loading