The original purpose of DeployedDigest in the design was for the case where the desired image ref is tag-based. In that case, TargetDigest represents the latest resolved digest for the tag, and DeployedDigest represents the last digest that was applied to all nodes. Without it, it's hard for sysadmins to know what the last deployed digest from that tag actually is.
The idea then in the code is that we would update deployedDigest to match targetDigest once all nodes have successfully rebooted into the target digest. That makes it a record of an event, which is really weird because that's just not at all how the code works. The reconciliation loop is completely stateless and doesn't really have "events". It calculates what actions need to be done to get to convergence. It also needs to handle cases like a new node joining the cluster, but starting from an older disk image (so then DeployedDigest becomes a bit of a lie). Or if a node does reboot into the target digest, but is still degraded or NotReady (so we'd need to apply nuance to what DeployedDigest means as an event).
Anyway, I think we should get rid of DeployedDigest (and UpdateAvailable, which derives from it), and instead replace it with DeployHistory, which is more an aggregate count of past actions. Something like:
deployHistory:
- digest: sha256:... # digest that was deployed
version: v1.0.1 # image version that deployed
nodeCount: 12 # how many nodes we deployed to
lastDeployed: $timestamp # the last time a node was deployed that digest
- digest: sha256:...
version: v1.0.0
nodeCount: 12
lastDeployed: $timestamp
...
(Keep e.g. the most recent 3 entries or something.)
That gives strictly more information, is less ambiguous, and more easily maps to how the code actually works.
The original purpose of
DeployedDigestin the design was for the case where the desired image ref is tag-based. In that case,TargetDigestrepresents the latest resolved digest for the tag, andDeployedDigestrepresents the last digest that was applied to all nodes. Without it, it's hard for sysadmins to know what the last deployed digest from that tag actually is.The idea then in the code is that we would update deployedDigest to match targetDigest once all nodes have successfully rebooted into the target digest. That makes it a record of an event, which is really weird because that's just not at all how the code works. The reconciliation loop is completely stateless and doesn't really have "events". It calculates what actions need to be done to get to convergence. It also needs to handle cases like a new node joining the cluster, but starting from an older disk image (so then
DeployedDigestbecomes a bit of a lie). Or if a node does reboot into the target digest, but is still degraded or NotReady (so we'd need to apply nuance to whatDeployedDigestmeans as an event).Anyway, I think we should get rid of
DeployedDigest(andUpdateAvailable, which derives from it), and instead replace it withDeployHistory, which is more an aggregate count of past actions. Something like:(Keep e.g. the most recent 3 entries or something.)
That gives strictly more information, is less ambiguous, and more easily maps to how the code actually works.