feat: add kfutil upgrade command#330
Open
spbsoluble wants to merge 17 commits into
Open
Conversation
Adds 'kfutil upgrade' to atomically replace the running binary with any GitHub-tagged release, including pre-releases and older versions. - pkg/upgrade/upgrade.go: fetches release metadata from GitHub API, resolves the goreleaser zip asset for the current GOOS/GOARCH, verifies SHA-256 against the published SUMS file, and applies the update via minio/selfupdate (handles Windows rename-swap natively) - cmd/upgrade.go: cobra command wiring with --version, --dry-run, --force - pkg/upgrade/upgrade_test.go: 13 unit tests covering asset name resolution, checksum verification, zip extraction, and GitHub API responses via httptest mock server Flags: --version any valid GitHub tag (e.g. v1.9.0, v1.9.2-rc.14) --dry-run show what would happen without replacing the binary --force upgrade even when already at the target version
Security (P0): - SEC-1: verifyChecksum now hashes the zip archive, not the extracted binary, matching what goreleaser records in SHA256SUMS; missing SUMS entry is now an error instead of a silent pass - SEC-2: GITHUB_TOKEN is only forwarded to hosts in an explicit allowlist (api.github.com, github.com, objects.githubusercontent.com) to prevent token exfiltration via a tampered BrowserDownloadURL - SEC-3: upgrade aborts with an error when no SHA256SUMS asset is present in the release, making integrity verification mandatory Compliance (P1/P2): - AUD-6: import zerolog; all audit log entries now use structured fields - AUD-1: log.Info events before and after binary replacement with from_version, to_version, executable, operator, and source_url fields - AUD-2: resolveOperator() captures os/user.Current().Username and includes it in every structured log entry - AUD-3: log.Error before each error return in Run() with a stable event name - AUD-4: log.Warn when --force bypasses the version-match safety guard - AUD-5: log.Debug after each HTTP response in fetchReleaseFrom and download with url, method, and status_code fields Tests: inverted TestVerifyChecksum_AssetNotInSums to require.Error; added TestDownload_TokenNotSentToUntrustedHost to verify the allowlist behaviour.
- Add upgrade.run_started event at Run() entry with operator, versions, force, and dry_run fields — establishes baseline for anomaly detection - Add log.Info for the already-current early exit (upgrade.already_current) - Add log.Info for the dry-run exit path (upgrade.dry_run) - Add log.Error for archiveURL-not-found and sumsURL-missing returns (upgrade.asset_not_found, upgrade.sums_missing) - Expand --force logging to fire whenever the flag is set, not only when the version-equality guard would have blocked — captures --version usage - Thread operator through fetchReleaseFrom and download so all HTTP-level log entries carry identity context - Log a warning when resolveOperator() fails so auditors can distinguish the "unknown" sentinel from a real user named "unknown"
- cmd/upgrade.go: explicitly set zerolog level to InfoLevel (non-debug) or DebugLevel (--debug) instead of relying on the unset default; ensures audit events are durably emitted regardless of future changes to the informDebug initialization pattern used by other commands - upgrade.fetch_release_failed: add tag field so the attempted release tag is always captured in the forensic record - upgrade.extract_failed, upgrade.apply_failed: add source_url field for full binary provenance on failed privileged operations - upgrade.dry_run: rename archive_url → source_url for consistency with applying/applied/apply_failed events (single SIEM query field) - upgrade.checksum_mismatch: add source_url field so the archive provenance is captured on the security event - upgrade.rollback_failed: add dedicated event inside apply() for the rollback-also-failed sub-path so on-call engineers can distinguish a broken binary state from an ordinary upgrade failure - fetchReleaseFrom, download: add latency_ms field to all HTTP log events (SOC2 CC9.2 vendor API response metadata) - TestDownload_TokenSentToTrustedHost: add positive allowlist test verifying token IS forwarded to hosts in allowedTokenHosts
Logging is disabled when --debug is not passed, consistent with all other kfutil commands. The compliance gap (audit trail silently dropped) is an accepted risk — noted in project memory.
- apply(): add operator param so upgrade.rollback_failed carries identity
- upgrade.download_failed: rename url → source_url for SIEM field consistency
- upgrade.checksum_download_failed: rename url → sums_url (distinct resource)
- upgrade.github_api_response, upgrade.http_response: promote Debug → Info
so external API interactions appear in production log pipelines (SOC2 CC9.2)
- upgrade.apply_failed: add failure_reason field ("permission_denied" or
"apply_error") so incident responders can distinguish failure modes from
structured fields rather than error message strings
- normalizeTag(): log "latest" instead of "" when no --version was passed
so upgrade.run_started and upgrade.fetch_release_failed are unambiguous
- TestDownload_NonOKStatus: test that download() errors on non-200 responses
- fetchReleaseFrom, download: log upgrade.github_api_network_error /
upgrade.http_network_error (with latency_ms) on transport failure so
network errors are captured even when the HTTP response is never received
- fetchReleaseFrom: log upgrade.release_parse_failed when JSON decode fails
so the failure category (network vs application) is distinguishable
- upgrade.github_api_response, upgrade.http_response: emit at log.Warn
when status_code >= 400 so SIEM error thresholds trigger on GitHub API
errors (previously always log.Info regardless of status)
- apiClient / downloadClient: replace http.DefaultClient (no timeout) with
explicit http.Client{Timeout} — 30s for API calls, 5m for binary downloads;
ensures timed-out requests produce an error event rather than silently hanging
- upgrade.rollback_succeeded: log Info event when apply() fails but rollback succeeds, so auditors can distinguish safe vs corrupted binary outcomes - sanitizeURL(): strip query-string params before logging any URL field; prevents presigned CDN query-string credentials from appearing in log storage - github_token_present: add Bool field to all HTTP log events so credential use is traceable without logging the token value - fetchReleaseFrom: url.PathEscape(tag) before appending to URL so user-supplied --version values cannot alter the request URL structure - extractBinary: wrap rc with io.LimitReader(maxBinaryBytes=100MiB) to bound memory use and produce an explicit error rather than OOM on malformed or zip-bomb archives
- sanitizeURL applied to reqURL in fetchReleaseFrom (all 3 log sites:
github_api_network_error, github_api_response, release_parse_failed);
consistent with download() which already sanitized its URL fields
- download(): wrap resp.Body with io.LimitReader(maxBinaryBytes) before
io.ReadAll so the raw HTTP response (archive + SUMS) is bounded; closes
the gap where the zip-level limit in extractBinary was not yet applied
to the network layer
- upgrade.applying: add Bool("force", force) field so the event is
self-contained for change management audit without requiring cross-event
lookup to upgrade.run_started
- currentExecutable(): emit upgrade.executable_resolution_failed warning
when os.Executable() fails so auditors can distinguish real path from
the "kfutil" fallback sentinel in log records
Skipped (accepted):
- C-1: logs silent without --debug — user explicitly accepted this risk
- H-1: ConsoleWriter destroys structured fields — codebase-wide infra concern
- M-1: duplicate fetch_release_failed events — intentional layering; removing
the Run-level log would leave HTTP status error paths with no audit record
- dry-run fmt.Printf: use sanitizeURL() for archiveURL and sumsURL so presigned CDN query-string credentials never reach console output - download() error string: use sanitizeURL(rawURL) in the HTTP status error so query-string credentials are stripped from stderr / error aggregators - fetchReleaseFrom: log upgrade.github_api_request_build_failed on http.NewRequest error so malformed-URL failures leave an audit trace - download: log upgrade.http_request_build_failed on http.NewRequest error Skipped (accepted / out of scope): - M-1: force_override test requires Run() end-to-end mock harness
H1: add from_version, to_version, executable to rollback_failed and
rollback_succeeded by threading those values into apply(); all three
are in scope in Run() where the call site lives
H2: add operator param to currentExecutable() so
upgrade.executable_resolution_failed carries operator for identity
correlation
M1: emit upgrade.checksum_verified Info event (with operator, asset,
source_url) after verifyChecksum returns nil, closing the gap in
the checksum success audit trail
M2: add TestSanitizeURL_StripsQueryParams and
TestSanitizeURL_ParseErrorFallback to cover presigned-URL stripping
and parse-error fallback behaviour
H1: emit upgrade.github_api_rejected Error event before each non-200
return in fetchReleaseFrom switch — gives SIEM rules an
outcome-specific event name distinct from upgrade.github_api_response
H2: emit upgrade.http_request_failed Error event before the non-200
return in download — same outcome-specificity fix for asset fetches
M1: upgrade.rollback_succeeded changed from Info to Warn — a rollback
is a degraded-state recovery action, not a routine operation
M2: emit upgrade.apply_started Info inside apply() before
selfupdate.Apply() is called — records the exact point the
filesystem write commences
M5: add TestExtractBinary_ExceedsMaxSize to assert the maxBinaryBytes
size cap returns an error rather than an oversized slice
Low: strip URL fragments in sanitizeURL(); add
TestSanitizeURL_StripsFragment to cover the new behaviour
H1: add source_url to upgrade.rollback_failed and upgrade.rollback_succeeded
by adding sourceURL param to apply(); forensic reconstruction of a
failed upgrade can now trace the artifact without cross-event correlation
H2: detect silently-truncated downloads — after io.ReadAll+LimitReader, check
len(data) >= maxBinaryBytes and emit upgrade.download_size_limit_reached
Error before returning an error; prevents a 100 MiB-capped payload from
reaching checksum verification with a misleading mismatch event
M1: add TestDownload_BodyTruncatedAtLimit to assert the HTTP body size cap
rejects oversized payloads with the correct error message
M2: add comments on the paired Warn/Error emit blocks in fetchReleaseFrom
and download explaining why both events are intentional (latency_ms
lives in the Warn, the control decision in the Error)
M1: emit upgrade.http_body_read_error Error before returning a mid-body
read failure; distinguishes a truncated transfer from a refused
connection in forensic reconstruction (SOC2 CC7.3)
M3: add captureLog() test helper that redirects the global zerolog logger
to a buffer for test duration; use it to assert upgrade.http_request_failed,
upgrade.download_size_limit_reached, and upgrade.github_api_rejected are
actually emitted — test coverage now proves audit events fire, not just
that errors propagate
M4: add .Err(err) to upgrade.rollback_succeeded so the root cause of the
apply failure is in the structured log record alongside the rollback
outcome (SOC1 completeness and accuracy)
H1: capture applyErr before RollbackError unwraps it; add apply_error
string field to upgrade.rollback_failed so forensic reconstruction
has the root cause alongside the rollback failure (SOC1 completeness)
M3: split github_token_present into github_token_present (env var set?)
and github_token_forwarded (actually sent in header?) in both
fetchReleaseFrom and download; log events for untrusted-host downloads
now correctly record that a token existed but was withheld, rather than
masking token availability (SOC2 CC6.6 boundary protection telemetry)
M4: add comment to TestDownload_TokenSentToTrustedHost explaining why
these two tests must not run in parallel (allowedTokenHosts mutation)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
kfutil upgradeto atomically replace the running binary with any published GitHub releaseminio/selfupdaterename-swap — no restart required)SHA256SUMSfile before applyingUsage
Changes
pkg/upgrade/upgrade.gocmd/upgrade.go--version,--dry-run,--forceflagspkg/upgrade/upgrade_test.gogo.mod/go.sumgithub.com/minio/selfupdate v0.6.0Test plan
go test ./pkg/upgrade/...passes (22/22)kfutil upgrade --dry-runreports "already at latest" when currentkfutil upgrade --dry-run --version <tag>resolves correct platform asset URLkfutil upgrade --dry-run --version <nonexistent>returns clear errorkfutil upgrade --force --dry-runshows download target even when already currentkfutil versionreflects new version