feat(stack): add obol node for k3s multi-node clusters#637
Open
bussyjd wants to merge 1 commit into
Open
Conversation
obol node token prints the k3s agent-join one-liner (K3S_URL + K3S_TOKEN,
version-pinned to the server); obol node list shows nodes with their
accelerator label. Both guard against the k3d backend — a Docker-Desktop
master's flannel overlay isn't routable off-host, so remote node joins
only work on a native k3s master.
Also templates {{NODE_IP}}/{{NODE_HOSTNAME}} into k3s-config.yaml's
tls-san (via backend_k3s Init) so workers can join the server by LAN IP
or hostname; k3s already auto-adds the primary node IP, so this mainly
buys an explicit hostname SAN + a deterministic cert.
Validated live on two LAN boxes: a native k3s master scheduling
cross-node pods (CoreDNS + flannel) and a 4-bit QLoRA fine-tune onto a
remote RTX 2060, adapter persisted to a node-local local-path PVC.
LuuOW
reviewed
Jun 14, 2026
LuuOW
left a comment
There was a problem hiding this comment.
Technical audit: code implementation verified for system consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a first-class CLI for adding worker nodes to a k3s-backed stack — the multi-node "master runs pods on another node" path that's fatal on the k3d/Docker-Desktop master but works natively over the LAN.
What
obol node token— prints the k3s agent-join one-liner (K3S_URL+K3S_TOKEN, version-pinned to the running server).--jsonfor machine use,--server-urloverride.obol node list— nodes with theirobol.tech/acceleratorlabel.a Docker-Desktop master cannot accept remote node joins — its flannel overlay is not routable off-host).k3s-config.yamlnow templates{{NODE_IP}}/{{NODE_HOSTNAME}}(substituted inbackend_k3s.Init) so workers can join by LAN IP or hostname. (k3s already auto-adds the primary node IP, so this mainly adds an explicit hostname SAN + cert determinism.)Why
On Docker Desktop the k3d server's flannel VTEP is a VM-internal
172.xIP, unroutable from the LAN — a joined agent registers Ready but cross-node pod networking hangs. On a native k3s master it all works. This makes that path ergonomic.Tests / validation
internal/stack/node_test.go,backend_k3s_init_test.go);go build ./...+ fullinternal/stacksuite green; backend guard verified live.Faithful to the existing
Backendinterface andfunc xCommand(cfg) *cli.Commandpattern.