feat(pool): warm-sandbox controller — daemon, multi-image, concurrency, backpressure (P1 of CoW plan) by ZhiXiao-Lin · Pull Request #18 · AI45Lab/Box

ZhiXiao-Lin · 2026-06-11T05:21:11Z

P1 of the CoW snapshot-fork plan (#17), built up across 7 commits into a complete agent-sandbox controller. Low-risk keepalive+exec path — removes cold boot from the hot path without touching guest-init's lifecycle (the higher-fidelity P2 / deferred-main-spawn is intentionally left for a separate focused epic).

What it does

One pool start daemon pre-warms keepalive microVMs and serves pool run over a Unix socket; each request runs a command in an already-booted VM via the existing guest exec server (no cold boot), returns stdout/stderr/exit, and tears the VM down (the pool replenishes).

Capability	Commit
warm-sandbox daemon + `pool run` (cold 1688ms → warm 73ms, 23×)	`ab4e1b3`
Unix-socket wire-protocol tests	`5ecc534`
concurrent request serving + real-VM e2e	`089adb7`
multi-image multiplexing (lazy per-image pools)	`b07f3af`
`--warm image[=count]` fleet pre-warm at startup	`181fff2`
live `pool status` over the socket (per-image stats)	`7f283fa`
backpressure (bound concurrent in-flight sandboxes)	`193a0ad`

Notable design points

Re-injection via the existing exec server — no guest-init changes; a forked/warmed VM runs the real command via exec.
Tagged wire protocol ({"op":"run"|"status"}), length-prefixed JSON.
Backpressure: WarmPool::acquire boots on a pool miss with no max_size cap, so a per-image semaphore (permits = max_size, held through teardown) makes bursts queue instead of booting unbounded VMs.
Stacks with perf: CoW memory dedup (KSM) + boot-floor trim + reflink rootfs copy #16 (KSM dedups RAM across the pooled same-image VMs): pool gives speed, KSM gives density.

Tests

15 pool unit tests (protocol framing/roundtrip, request-envelope tagging, parse_warm_spec, backpressure bound, no-daemon error, …) + 54 warm_pool unit tests.
1 host-backed e2e (test_real_pool_warm_run, #[ignore]): single warm run + 3 concurrent + a --warm second image + pool status — passes on real Linux/KVM (verified each increment).
One small runtime addition: WarmPool::drain_idle(&self) (shut down from behind an Arc).

Out of scope (next, separate epic)

P2 — deferred-main-spawn for full box semantics incl. console logs (higher-risk lifecycle change, per #17).

The `pool` command pre-warmed VMs but nothing consumed them (stop/status were stubs; no run-path wiring; no keepalive, so pooled VMs could exit). Complete it into the low-risk keepalive+exec MVP from docs/cow-snapshot-fork-design.md: - `pool start` now boots the pooled VMs with a keepalive main (sleep loop) so they stay up with their exec server ready, and serves a Unix socket. - New `pool run -- CMD` client: connects, the daemon acquires a pre-booted VM, runs CMD via the existing guest exec server (no cold boot), returns stdout/stderr/exit code, and destroys the used VM; the pool replenishes a fresh one in the background. Removes cold boot from the hot path without touching guest-init's lifecycle (unlike the full deferred-main-spawn, which is deferred as the higher-risk P2). Requests are served sequentially for now (one sandbox at a time); concurrency is a follow-up. Protocol: length-prefixed JSON over the Unix socket.

Add CI-runnable tests for the framing + request/response handshake (previously only POC-verified on KVM): frame roundtrip, full client/server protocol over a real Unix socket with a stub server, and truncated-stream error handling.

- serve() now handles each `pool run` concurrently (Arc<WarmPool> + spawned task per connection) instead of one-at-a-time, so independent sandboxes don't queue. Added WarmPool::drain_idle(&self) so the pool can be shut down from behind the Arc (signal_shutdown stops the replenisher; drain_idle destroys idle VMs); in-flight requests keep their own acquired VM. - Added a host-backed e2e test (test_real_pool_warm_run, #[ignore]): spawns the daemon, waits for its socket, runs a command in a warm sandbox and asserts the output, then fires 3 concurrent `pool run`s and asserts all succeed. Adds a spawn_background helper to the test harness.

One daemon can now serve sandboxes of different images. Added a PoolRegistry keyed by image that lazily starts (and pre-warms) a WarmPool on first use: - `pool start --image X` is now optional and sets the DEFAULT image; the daemon also warms a pool for any other image requested via `pool run --image Y` on first use. - `pool run [--image Y] -- CMD`: the request carries an optional image (defaults to the daemon's). RunRequest gains a `#[serde(default)] image` field — wire back-compat (older clients / default-image daemons omit it). - Shutdown drains idle VMs across all pools (drain_all). Extends the e2e test (test_real_pool_warm_run) with a lazy second image via `pool run --image`. Turns the single-image MVP into a real sandbox controller.

So the common sandbox images are warm-ready instead of cold on first request. `pool start --warm python:3=4,node:20` pre-warms each listed image at startup (count defaults to --size); any other image is still warmed lazily on first use. - Added parse_warm_spec (image[=count], whitespace-tolerant; unit-tested). - PoolRegistry::get_or_create_with_size lets a pre-warm use a per-image count; the lazy path keeps the daemon default size. - e2e test now starts the daemon with --warm <second>=2 and runs that image, exercising startup pre-warm end-to-end.

`pool status` was a stub pointing at Prometheus; now it queries the running daemon over the Unix socket and prints per-image pool stats (idle / created / acquired / evicted), or `--json`. - Wire protocol is now a tagged `Request` envelope ({"op":"run",...} / {"op":"status"}) so the daemon can dispatch; `pool run` sends Run, `pool status` sends Status. RunResponse unchanged; new StatusResponse/ImageStat. - PoolRegistry::stats() snapshots every image's WarmPool stats, sorted. - `pool status` gains --socket. e2e test now asserts status lists both warmed images. 14 unit tests (added Request-envelope tagging + no-daemon error).

WarmPool::acquire boots a VM on a pool miss with NO max_size cap (max is only enforced in release/replenish), so a burst of concurrent `pool run`s would boot unbounded VMs and exhaust the host. Add a per-image semaphore (permits = max_size) acquired before pool.acquire and released only after the VM is torn down: excess requests queue for a slot instead of exploding. - PoolEntry { pool, sem }; the registry hands out the entry, handle_conn holds the owned permit through the backgrounded destroy. - Unit test asserts peak concurrency never exceeds the permit count.

* refactor(init): early-bind vsock servers + event-driven readiness (issue #3) Restructures the exec/PTY readiness path so boot waits for a real readiness EVENT bounded by VM liveness, instead of guessing a fixed timeout — replacing the interim 10s→30s band-aid. P1 — bind early, serve late. Split exec_server/pty_server into bind_*()->Listener (pure socket/bind/listen syscalls) and serve_*(listener) (the accept loop). run_init now binds both vsock listeners on the main thread right after the filesystem mounts (Step 2.6), BEFORE the slow network bring-up and the container fork, then spawns the accept loops after the fork (Step 8). Binding adds no thread, so the single-threaded-at-fork invariant that keeps spawn_isolated safe is preserved. The listen backlog is filled from boot, so a host connect QUEUES instead of being refused — this removes the `run -it` PTY "Connection refused". CLOEXEC keeps the forked container from inheriting the listeners. P2 — event-driven, liveness-bounded readiness. Early binding makes the host `connect` succeed immediately, so heartbeat()'s (timeout-less) read would block until the guest's accept loop runs. wait_for_exec_ready is rewritten to bound each connect+heartbeat attempt (tokio timeout), return at once when the VM exits (has_exited, zombie-aware — fast-exit containers never stall), and treat a large absolute cap purely as a backstop against a wedged-but-alive guest. A healthy guest passes the heartbeat the moment its accept loop runs, however late in a slow cold boot — so the false "heartbeat failed" warning is gone without a fixed budget to outrun. Also folds in the issue-#3 cleanups: dead `/sbin/init` BOX_EXEC_EXEC default → `/bin/sh`, and the stale resolve_oci_entrypoint doc comment. Deferred: an explicit guest→host "ready" beacon on a new vsock port was considered but NOT wired — port_forward uses add_vsock_port(listen=true) with a guest connect-out, which contradicts the assumed listen=false direction for guest→host, and that is only verifiable on KVM. The liveness-bounded heartbeat achieves the same correctness without guessing cross-process vsock semantics. Supersedes the interim 30s fix (PR #14). * fix(init): real PID1 reaper — reap orphans without stealing exec/PTY exit codes guest-init runs as PID 1 but only waited on the container pid, so reparented grandchildren and the sidecar were never reaped and accumulated as zombies for the VM's lifetime. The earlier code couldn't just waitpid(-1): that races with the exec/PTY handlers, which waitpid their own children to read the real exit code — a stolen child makes the handler see ECHILD and report a bogus exit 0 (exec_server.rs). That tension is exactly why a prior fix narrowed the loop to waitpid(container_pid), trading the zombie leak for correct exec codes. Resolve both with a small reaper registry: - New `reaper` module: handlers mark their child pid MANAGED across the spawn (the lock is held across fork, closing the spawn/register race for fast-exiting commands like `exec -- false`); an RAII guard unregisters on every return path. - The supervision loop now peeks exited children non-destructively with `waitid(WNOWAIT)` and routes: the container -> reap + propagate exit code (VM lifecycle, unchanged); MANAGED children -> left for their handler to reap (real exit codes preserved); everything else (orphans + sidecar) -> reaped here. - exec one-shot + streaming spawns and the PTY fork register their children; their existing waitpid/try_wait paths are unchanged. Fixes the zombie leak and makes the long-standing "reaped by the zombie-reaper loop" comments true again, with no regression to exec/PTY or container exit codes. Unit-tested (reaper registry); needs KVM verification of exec exit codes + orphan reaping. Builds on P1+P2 (issue #3). * docs: P2 deferred-main-spawn design (GO-WITH-CONDITIONS) Adversarial mapping of the #15+#18 base resolved both crux uncertainties: console logs come free via process-wide fd inheritance (Stdio::inherit, not the exec path's piped), and the multi-threaded fork hazard is avoided by spawning the deferred main via Command::spawn (not spawn_isolated's raw fork; the VM already isolates). Conditions: single spawn-main (CAS) + atomic late container-pid handoff to the reaper. Includes risk-ranked blockers + a 7-phase plan whose Phase 0 is a single KVM prototype that de-risks the whole feature. --------- Co-authored-by: Roy Lin <roylin@a3s.box>

ZhiXiao-Lin changed the title ~~feat(pool): warm-sandbox daemon + pool run (P1 of CoW plan)~~ feat(pool): warm-sandbox controller — daemon, multi-image, concurrency, backpressure (P1 of CoW plan) Jun 11, 2026

This was referenced Jun 11, 2026

docs: P2 deferred-main-spawn design (GO-WITH-CONDITIONS) #19

Closed

chore(ci): fix pre-existing fmt + clippy on main #20

Merged

Roy Lin added 7 commits June 11, 2026 16:31

test(pool): cover the Unix-socket wire protocol

108c34a

Add CI-runnable tests for the framing + request/response handshake (previously only POC-verified on KVM): frame roundtrip, full client/server protocol over a real Unix socket with a stub server, and truncated-stream error handling.

ZhiXiao-Lin force-pushed the feat/p1-template-pool branch from 193a0ad to 379fec0 Compare June 11, 2026 08:31

ZhiXiao-Lin merged commit 87517b3 into main Jun 11, 2026
7 checks passed

ZhiXiao-Lin deleted the feat/p1-template-pool branch June 11, 2026 08:38

ZhiXiao-Lin mentioned this pull request Jun 11, 2026

docs: P2 deferred-main-spawn design (GO-WITH-CONDITIONS) #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pool): warm-sandbox controller — daemon, multi-image, concurrency, backpressure (P1 of CoW plan)#18

feat(pool): warm-sandbox controller — daemon, multi-image, concurrency, backpressure (P1 of CoW plan)#18
ZhiXiao-Lin merged 7 commits into
mainfrom
feat/p1-template-pool

ZhiXiao-Lin commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhiXiao-Lin commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

Notable design points

Tests

Out of scope (next, separate epic)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhiXiao-Lin commented Jun 11, 2026 •

edited

Loading