Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ The ignored `core_smoke` suite covers the core CLI path on a real MicroVM host:
- non-TTY `exec`, PTY, `attach`, `logs`, `stop`, `wait`, and `rm`;
- TCP published ports with host loopback HTTP reachability;
- bridge network endpoint allocation, peer `/etc/hosts`, connect/disconnect, and force removal cleanup;
- named volumes, `cp`, `diff`, `export`, `commit`, `snapshot`, restart-policy monitor recovery, and Compose health/volume flow.
- named volumes, `cp`, `diff`, `export`, `commit`, `snapshot`, restart-policy monitor recovery, and Compose health/volume flow;
- warm pool (`pool start`/`pool run`): pre-warmed sandboxes served over a socket, with backpressure and multi-image lazy pools; `--deferred` runs each command as the box's real main for full box semantics (real exit code + json-file console logs) with no cold boot.

The most recent local record in this branch: all 14 ignored `core_smoke` tests
passed on macOS HVF with an offline Alpine OCI archive, and the ignored
Expand Down
32 changes: 28 additions & 4 deletions docs/p2-deferred-main-spawn-design.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,33 @@
# Design: P2 — Deferred-Main-Spawn (full box semantics for pooled sandboxes)

Status: **GO-WITH-CONDITIONS** (design + prototype-first). Builds on
`refactor/init-readiness` (PR #15: early-bind + event-driven readiness + PID1
reaper) and `feat/p1-template-pool` (PR #18: the warm-sandbox pool controller).
Derived from an adversarial mapping of the real #15+#18 base.
Status: **IMPLEMENTED & KVM-verified** (the GO-WITH-CONDITIONS design below was
confirmed in practice). Builds on PR #15 (early-bind + event-driven readiness +
PID1 reaper) and PR #18 (the warm-sandbox pool controller).

## 0. Implementation status & usage

```sh
# Boot a pool of IDLE sandboxes (no container main at boot)...
a3s-box pool start --deferred --image alpine:latest --size 4 --socket /tmp/p.sock
# ...then run a command as the box's REAL main — full box semantics:
a3s-box pool run --socket /tmp/p.sock -- sh -c 'echo hi; exit 7' # exit 7; output in the json-file logs
```

What landed (vs the design): a `BOX_DEFERRED_MAIN=1`/`BoxConfig.deferred_main`
IDLE boot (skip the boot spawn; the `ECHILD`-with-no-container case keeps PID 1
waiting instead of exiting — see §5/Phase 1); a `spawn-main` control frame (bare
for the `run` path's boot-stashed command, or carrying a command for the pool,
which pre-warms before the command is known); the deferred main spawned via the
exec server's `build_command` (**identical seccomp/user/no-new-privs** to a boot
main — verified `Seccomp: 2`, `--user 1000`→uid 1000) with stdio overridden to
`inherit` so its stdout/stderr reach the json-file console logs; the pid
CAS-published while MANAGED then reaped by the supervision loop for the real exit
code; `pool start --deferred` + `VmManager::run_deferred_main`. Resource limits
need no extra work — they are VM-level (libkrun `set_vm_config`), so a deferred
main shares the boot main's limits. KVM-verified end to end (exit codes 7/3/0,
stdout+stderr from the json-file logs, seccomp applied — all from a pre-warmed
pool), with unit (`deferred_spec_json`) + host e2e (`test_real_pool_deferred_main`)
coverage. Not yet wired: a typed pool API (`Request::SpawnMain`) beyond the CLI.

## 1. Goal

Expand Down
60 changes: 58 additions & 2 deletions src/cli/src/commands/pool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ pub struct PoolStartArgs {
#[arg(long, value_delimiter = ',')]
pub warm: Vec<String>,

/// Boot pooled VMs IDLE and run each `pool run` command as the box's real MAIN
/// (full box semantics: exit code + json-file console logs), instead of
/// exec-into-keepalive.
#[arg(long)]
pub deferred: bool,

/// Output as JSON
#[arg(long)]
pub json: bool,
Expand Down Expand Up @@ -176,6 +182,17 @@ fn keepalive_cmd() -> Vec<String> {
]
}

/// Build the `spawn-main` JSON spec for a deferred-mode pool command (executable +
/// args + a standard PATH so the binary resolves like a normal container main).
fn deferred_spec_json(cmd: &[String]) -> Vec<u8> {
let spec = serde_json::json!({
"executable": cmd.first().map(String::as_str).unwrap_or("/bin/sh"),
"args": cmd.get(1..).unwrap_or(&[]),
"env": [["PATH", "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]],
});
serde_json::to_vec(&spec).unwrap_or_default()
}

/// Parse a `--warm` entry of the form `image[=count]` (count defaults to `default_size`).
fn parse_warm_spec(entry: &str, default_size: usize) -> Result<(String, usize), String> {
match entry.split_once('=') {
Expand Down Expand Up @@ -212,6 +229,9 @@ struct PoolRegistry {
size: usize,
max: usize,
ttl: u64,
/// When true, pooled VMs boot IDLE and `pool run` spawns the command as the
/// box's real MAIN (full box semantics), instead of exec-into-keepalive.
deferred: bool,
}

impl PoolRegistry {
Expand All @@ -234,8 +254,11 @@ impl PoolRegistry {
};
let box_config = BoxConfig {
image: image.to_string(),
// In deferred mode the VM boots IDLE (keepalive cmd is stashed but
// unused — the per-request command arrives via spawn-main).
cmd: keepalive_cmd(),
pool: pool_config.clone(),
deferred_main: self.deferred,
..Default::default()
};
let pool = std::sync::Arc::new(
Expand Down Expand Up @@ -303,6 +326,7 @@ async fn execute_start(args: PoolStartArgs) -> Result<(), Box<dyn std::error::Er
size: args.size,
max: args.max,
ttl: args.ttl,
deferred: args.deferred,
});

// Pre-warm the default image, if one was given.
Expand Down Expand Up @@ -450,8 +474,21 @@ async fn handle_conn(
.expect("pool semaphore is never closed");
match entry.pool.acquire().await {
Err(e) => err_resp(format!("acquire failed: {e}")),
Ok(vm) => {
let resp = match vm.exec_command(run.cmd, EXEC_TIMEOUT_NS).await {
Ok(mut vm) => {
// Deferred-main: run the command as the box's real MAIN
// (full box semantics — exit code + json-file console logs).
// Otherwise exec it in the keepalive VM (output via the
// exec stream).
let result = if registry.deferred {
vm.run_deferred_main(
&deferred_spec_json(&run.cmd),
std::time::Duration::from_secs(60),
)
.await
} else {
vm.exec_command(run.cmd, EXEC_TIMEOUT_NS).await
};
let resp = match result {
Ok(o) => RunResponse {
stdout: o.stdout,
stderr: o.stderr,
Expand Down Expand Up @@ -694,6 +731,23 @@ mod tests {
assert!(parse_warm_spec("=4", 2).is_err());
}

#[test]
fn test_deferred_spec_json() {
// The spawn-main spec for a deferred pool run: executable + args + a PATH
// so the binary resolves like a normal container main.
let json = deferred_spec_json(&["sh".into(), "-c".into(), "echo hi".into()]);
let v: serde_json::Value = serde_json::from_slice(&json).unwrap();
assert_eq!(v["executable"], "sh");
assert_eq!(v["args"][0], "-c");
assert_eq!(v["args"][1], "echo hi");
assert_eq!(v["env"][0][0], "PATH");
assert!(v["env"][0][1].as_str().unwrap().contains("/bin"));
// Empty cmd falls back to a shell rather than panicking.
let j2 = deferred_spec_json(&[]);
let v2: serde_json::Value = serde_json::from_slice(&j2).unwrap();
assert_eq!(v2["executable"], "/bin/sh");
}

#[tokio::test]
async fn test_backpressure_bounds_concurrency() {
// The contract PoolEntry relies on: a permit (held until teardown) caps
Expand Down Expand Up @@ -763,6 +817,7 @@ mod tests {
ttl: 300,
socket: DEFAULT_SOCKET.to_string(),
warm: vec![],
deferred: false,
json: false,
};
let result = execute_start(args).await;
Expand All @@ -779,6 +834,7 @@ mod tests {
ttl: 300,
socket: DEFAULT_SOCKET.to_string(),
warm: vec![],
deferred: false,
json: false,
};
let result = execute_start(args).await;
Expand Down
80 changes: 80 additions & 0 deletions src/cli/tests/host_smoke.rs
Original file line number Diff line number Diff line change
Expand Up @@ -600,3 +600,83 @@ fn test_real_pool_warm_run() {
let _ = daemon.kill();
let _ = daemon.wait();
}

/// Deferred-main pool end-to-end: `pool start --deferred` boots pooled VMs IDLE,
/// and `pool run` spawns the command as the box's real MAIN — full box semantics
/// (stdout/stderr from the box's json-file console logs + the real exit code),
/// unlike the keepalive+exec MVP's exec-stream output. Host-backed (KVM).
#[test]
#[ignore]
fn test_real_pool_deferred_main() {
let cli = CliTest::new();
let image = host_smoke_image();
seed_runnable_alpine_image(&cli, &image);
let socket = cli
.home_path()
.join("pd.sock")
.to_str()
.expect("utf8 socket path")
.to_string();

let mut daemon = cli.spawn_background(&[
"pool",
"start",
"--deferred",
"--image",
image.as_str(),
"--size",
"2",
"--max",
"4",
"--socket",
socket.as_str(),
]);

let sock_path = cli.home_path().join("pd.sock");
let start = std::time::Instant::now();
while !sock_path.exists() {
if start.elapsed() > Duration::from_secs(120) {
let _ = daemon.kill();
panic!("deferred pool daemon never created its socket");
}
if let Ok(Some(status)) = daemon.try_wait() {
panic!("deferred pool daemon exited early: {status}");
}
std::thread::sleep(Duration::from_millis(200));
}
std::thread::sleep(Duration::from_secs(5));

// Full box semantics: stdout + stderr come back from the box's json-file logs.
let (out, err, ok) = cli.output(&[
"pool",
"run",
"--socket",
socket.as_str(),
"--",
"sh",
"-c",
"echo deferred-stdout; echo deferred-stderr 1>&2; exit 0",
]);
assert!(
ok,
"deferred pool run failed.\nstdout:\n{out}\nstderr:\n{err}"
);
assert!(out.contains("deferred-stdout"), "missing stdout: {out:?}");
assert!(err.contains("deferred-stderr"), "missing stderr: {err:?}");

// The real container exit code propagates (not the exec-stream's).
let (_o, _e, ok2) = cli.output(&[
"pool",
"run",
"--socket",
socket.as_str(),
"--",
"sh",
"-c",
"exit 7",
]);
assert!(!ok2, "expected a non-zero exit from the deferred main");

let _ = daemon.kill();
let _ = daemon.wait();
}
8 changes: 8 additions & 0 deletions src/core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,13 @@ pub struct BoxConfig {
#[serde(default)]
pub pool: PoolConfig,

/// Boot the VM IDLE — do not spawn the container main at boot; instead the
/// main is started later by a `spawn-main` control frame. Used by the pool so a
/// pre-warmed sandbox runs a per-request command as its real main, with full box
/// semantics (exit code + json-file console logs) and no cold boot.
#[serde(default)]
pub deferred_main: bool,

/// Port mappings: "host_port:guest_port" (e.g., "8080:80")
/// Maps host ports to guest ports via TSI (Transparent Socket Impersonation).
#[serde(default)]
Expand Down Expand Up @@ -407,6 +414,7 @@ impl Default for BoxConfig {
extra_env: vec![],
cache: CacheConfig::default(),
pool: PoolConfig::default(),
deferred_main: false,
port_map: vec![],
dns: vec![],
add_hosts: vec![],
Expand Down
Loading
Loading