Skip to content

feat: add kfutil upgrade command#330

Open
spbsoluble wants to merge 17 commits into
mainfrom
feat/self-upgrade
Open

feat: add kfutil upgrade command#330
spbsoluble wants to merge 17 commits into
mainfrom
feat/self-upgrade

Conversation

@spbsoluble

@spbsoluble spbsoluble commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds kfutil upgrade to atomically replace the running binary with any published GitHub release
  • Supports Linux, macOS, FreeBSD, and Windows (via minio/selfupdate rename-swap — no restart required)
  • Verifies SHA-256 checksum against the goreleaser-published SHA256SUMS file before applying

Usage

kfutil upgrade                        # install latest release
kfutil upgrade --version 1.9.2-rc.14 # install any valid GitHub tag
kfutil upgrade --dry-run              # preview without replacing the binary
kfutil upgrade --force                # upgrade even if already at target version

Changes

File Description
pkg/upgrade/upgrade.go GitHub API fetch, zip extraction, SHA-256 verification, atomic binary replace
cmd/upgrade.go Cobra command with --version, --dry-run, --force flags
pkg/upgrade/upgrade_test.go 22 unit tests (asset name resolution, checksum, zip extraction, mock HTTP, size limits, URL sanitization)
go.mod / go.sum Adds github.com/minio/selfupdate v0.6.0

Test plan

  • go test ./pkg/upgrade/... passes (22/22)
  • kfutil upgrade --dry-run reports "already at latest" when current
  • kfutil upgrade --dry-run --version <tag> resolves correct platform asset URL
  • kfutil upgrade --dry-run --version <nonexistent> returns clear error
  • kfutil upgrade --force --dry-run shows download target even when already current
  • Manual upgrade on Linux replaces binary and kfutil version reflects new version
  • Manual upgrade on Windows replaces binary without requiring a restart

Adds 'kfutil upgrade' to atomically replace the running binary with any
GitHub-tagged release, including pre-releases and older versions.

- pkg/upgrade/upgrade.go: fetches release metadata from GitHub API,
  resolves the goreleaser zip asset for the current GOOS/GOARCH,
  verifies SHA-256 against the published SUMS file, and applies the
  update via minio/selfupdate (handles Windows rename-swap natively)
- cmd/upgrade.go: cobra command wiring with --version, --dry-run, --force
- pkg/upgrade/upgrade_test.go: 13 unit tests covering asset name
  resolution, checksum verification, zip extraction, and GitHub API
  responses via httptest mock server

Flags:
  --version  any valid GitHub tag (e.g. v1.9.0, v1.9.2-rc.14)
  --dry-run  show what would happen without replacing the binary
  --force    upgrade even when already at the target version
Security (P0):
- SEC-1: verifyChecksum now hashes the zip archive, not the extracted binary,
  matching what goreleaser records in SHA256SUMS; missing SUMS entry is now
  an error instead of a silent pass
- SEC-2: GITHUB_TOKEN is only forwarded to hosts in an explicit allowlist
  (api.github.com, github.com, objects.githubusercontent.com) to prevent
  token exfiltration via a tampered BrowserDownloadURL
- SEC-3: upgrade aborts with an error when no SHA256SUMS asset is present
  in the release, making integrity verification mandatory

Compliance (P1/P2):
- AUD-6: import zerolog; all audit log entries now use structured fields
- AUD-1: log.Info events before and after binary replacement with
  from_version, to_version, executable, operator, and source_url fields
- AUD-2: resolveOperator() captures os/user.Current().Username and includes
  it in every structured log entry
- AUD-3: log.Error before each error return in Run() with a stable event name
- AUD-4: log.Warn when --force bypasses the version-match safety guard
- AUD-5: log.Debug after each HTTP response in fetchReleaseFrom and download
  with url, method, and status_code fields

Tests: inverted TestVerifyChecksum_AssetNotInSums to require.Error; added
TestDownload_TokenNotSentToUntrustedHost to verify the allowlist behaviour.
- Add upgrade.run_started event at Run() entry with operator, versions,
  force, and dry_run fields — establishes baseline for anomaly detection
- Add log.Info for the already-current early exit (upgrade.already_current)
- Add log.Info for the dry-run exit path (upgrade.dry_run)
- Add log.Error for archiveURL-not-found and sumsURL-missing returns
  (upgrade.asset_not_found, upgrade.sums_missing)
- Expand --force logging to fire whenever the flag is set, not only when
  the version-equality guard would have blocked — captures --version usage
- Thread operator through fetchReleaseFrom and download so all HTTP-level
  log entries carry identity context
- Log a warning when resolveOperator() fails so auditors can distinguish
  the "unknown" sentinel from a real user named "unknown"
- cmd/upgrade.go: explicitly set zerolog level to InfoLevel (non-debug)
  or DebugLevel (--debug) instead of relying on the unset default;
  ensures audit events are durably emitted regardless of future changes
  to the informDebug initialization pattern used by other commands
- upgrade.fetch_release_failed: add tag field so the attempted release
  tag is always captured in the forensic record
- upgrade.extract_failed, upgrade.apply_failed: add source_url field
  for full binary provenance on failed privileged operations
- upgrade.dry_run: rename archive_url → source_url for consistency with
  applying/applied/apply_failed events (single SIEM query field)
- upgrade.checksum_mismatch: add source_url field so the archive
  provenance is captured on the security event
- upgrade.rollback_failed: add dedicated event inside apply() for the
  rollback-also-failed sub-path so on-call engineers can distinguish
  a broken binary state from an ordinary upgrade failure
- fetchReleaseFrom, download: add latency_ms field to all HTTP log
  events (SOC2 CC9.2 vendor API response metadata)
- TestDownload_TokenSentToTrustedHost: add positive allowlist test
  verifying token IS forwarded to hosts in allowedTokenHosts
Logging is disabled when --debug is not passed, consistent with all other
kfutil commands. The compliance gap (audit trail silently dropped) is an
accepted risk — noted in project memory.
- apply(): add operator param so upgrade.rollback_failed carries identity
- upgrade.download_failed: rename url → source_url for SIEM field consistency
- upgrade.checksum_download_failed: rename url → sums_url (distinct resource)
- upgrade.github_api_response, upgrade.http_response: promote Debug → Info
  so external API interactions appear in production log pipelines (SOC2 CC9.2)
- upgrade.apply_failed: add failure_reason field ("permission_denied" or
  "apply_error") so incident responders can distinguish failure modes from
  structured fields rather than error message strings
- normalizeTag(): log "latest" instead of "" when no --version was passed
  so upgrade.run_started and upgrade.fetch_release_failed are unambiguous
- TestDownload_NonOKStatus: test that download() errors on non-200 responses
- fetchReleaseFrom, download: log upgrade.github_api_network_error /
  upgrade.http_network_error (with latency_ms) on transport failure so
  network errors are captured even when the HTTP response is never received
- fetchReleaseFrom: log upgrade.release_parse_failed when JSON decode fails
  so the failure category (network vs application) is distinguishable
- upgrade.github_api_response, upgrade.http_response: emit at log.Warn
  when status_code >= 400 so SIEM error thresholds trigger on GitHub API
  errors (previously always log.Info regardless of status)
- apiClient / downloadClient: replace http.DefaultClient (no timeout) with
  explicit http.Client{Timeout} — 30s for API calls, 5m for binary downloads;
  ensures timed-out requests produce an error event rather than silently hanging
- upgrade.rollback_succeeded: log Info event when apply() fails but rollback
  succeeds, so auditors can distinguish safe vs corrupted binary outcomes
- sanitizeURL(): strip query-string params before logging any URL field;
  prevents presigned CDN query-string credentials from appearing in log storage
- github_token_present: add Bool field to all HTTP log events so credential
  use is traceable without logging the token value
- fetchReleaseFrom: url.PathEscape(tag) before appending to URL so
  user-supplied --version values cannot alter the request URL structure
- extractBinary: wrap rc with io.LimitReader(maxBinaryBytes=100MiB) to
  bound memory use and produce an explicit error rather than OOM on malformed
  or zip-bomb archives
- sanitizeURL applied to reqURL in fetchReleaseFrom (all 3 log sites:
  github_api_network_error, github_api_response, release_parse_failed);
  consistent with download() which already sanitized its URL fields
- download(): wrap resp.Body with io.LimitReader(maxBinaryBytes) before
  io.ReadAll so the raw HTTP response (archive + SUMS) is bounded; closes
  the gap where the zip-level limit in extractBinary was not yet applied
  to the network layer
- upgrade.applying: add Bool("force", force) field so the event is
  self-contained for change management audit without requiring cross-event
  lookup to upgrade.run_started
- currentExecutable(): emit upgrade.executable_resolution_failed warning
  when os.Executable() fails so auditors can distinguish real path from
  the "kfutil" fallback sentinel in log records

Skipped (accepted):
- C-1: logs silent without --debug — user explicitly accepted this risk
- H-1: ConsoleWriter destroys structured fields — codebase-wide infra concern
- M-1: duplicate fetch_release_failed events — intentional layering; removing
  the Run-level log would leave HTTP status error paths with no audit record
- dry-run fmt.Printf: use sanitizeURL() for archiveURL and sumsURL so
  presigned CDN query-string credentials never reach console output
- download() error string: use sanitizeURL(rawURL) in the HTTP status error
  so query-string credentials are stripped from stderr / error aggregators
- fetchReleaseFrom: log upgrade.github_api_request_build_failed on
  http.NewRequest error so malformed-URL failures leave an audit trace
- download: log upgrade.http_request_build_failed on http.NewRequest error

Skipped (accepted / out of scope):
- M-1: force_override test requires Run() end-to-end mock harness
H1: add from_version, to_version, executable to rollback_failed and
    rollback_succeeded by threading those values into apply(); all three
    are in scope in Run() where the call site lives
H2: add operator param to currentExecutable() so
    upgrade.executable_resolution_failed carries operator for identity
    correlation
M1: emit upgrade.checksum_verified Info event (with operator, asset,
    source_url) after verifyChecksum returns nil, closing the gap in
    the checksum success audit trail
M2: add TestSanitizeURL_StripsQueryParams and
    TestSanitizeURL_ParseErrorFallback to cover presigned-URL stripping
    and parse-error fallback behaviour
H1: emit upgrade.github_api_rejected Error event before each non-200
    return in fetchReleaseFrom switch — gives SIEM rules an
    outcome-specific event name distinct from upgrade.github_api_response
H2: emit upgrade.http_request_failed Error event before the non-200
    return in download — same outcome-specificity fix for asset fetches
M1: upgrade.rollback_succeeded changed from Info to Warn — a rollback
    is a degraded-state recovery action, not a routine operation
M2: emit upgrade.apply_started Info inside apply() before
    selfupdate.Apply() is called — records the exact point the
    filesystem write commences
M5: add TestExtractBinary_ExceedsMaxSize to assert the maxBinaryBytes
    size cap returns an error rather than an oversized slice
Low: strip URL fragments in sanitizeURL(); add
    TestSanitizeURL_StripsFragment to cover the new behaviour
H1: add source_url to upgrade.rollback_failed and upgrade.rollback_succeeded
    by adding sourceURL param to apply(); forensic reconstruction of a
    failed upgrade can now trace the artifact without cross-event correlation
H2: detect silently-truncated downloads — after io.ReadAll+LimitReader, check
    len(data) >= maxBinaryBytes and emit upgrade.download_size_limit_reached
    Error before returning an error; prevents a 100 MiB-capped payload from
    reaching checksum verification with a misleading mismatch event
M1: add TestDownload_BodyTruncatedAtLimit to assert the HTTP body size cap
    rejects oversized payloads with the correct error message
M2: add comments on the paired Warn/Error emit blocks in fetchReleaseFrom
    and download explaining why both events are intentional (latency_ms
    lives in the Warn, the control decision in the Error)
M1: emit upgrade.http_body_read_error Error before returning a mid-body
    read failure; distinguishes a truncated transfer from a refused
    connection in forensic reconstruction (SOC2 CC7.3)
M3: add captureLog() test helper that redirects the global zerolog logger
    to a buffer for test duration; use it to assert upgrade.http_request_failed,
    upgrade.download_size_limit_reached, and upgrade.github_api_rejected are
    actually emitted — test coverage now proves audit events fire, not just
    that errors propagate
M4: add .Err(err) to upgrade.rollback_succeeded so the root cause of the
    apply failure is in the structured log record alongside the rollback
    outcome (SOC1 completeness and accuracy)
H1: capture applyErr before RollbackError unwraps it; add apply_error
    string field to upgrade.rollback_failed so forensic reconstruction
    has the root cause alongside the rollback failure (SOC1 completeness)
M3: split github_token_present into github_token_present (env var set?)
    and github_token_forwarded (actually sent in header?) in both
    fetchReleaseFrom and download; log events for untrusted-host downloads
    now correctly record that a token existed but was withheld, rather than
    masking token availability (SOC2 CC6.6 boundary protection telemetry)
M4: add comment to TestDownload_TokenSentToTrustedHost explaining why
    these two tests must not run in parallel (allowedTokenHosts mutation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant