Skip to content

docs(runbooks): flatKV↔memIAVL parity via sharded historical replay#448

Merged
bdchatham merged 4 commits into
mainfrom
feat/runbook-flatkv-parity-replay
Jul 1, 2026
Merged

docs(runbooks): flatKV↔memIAVL parity via sharded historical replay#448
bdchatham merged 4 commits into
mainfrom
feat/runbook-flatkv-parity-replay

Conversation

@bdchatham

Copy link
Copy Markdown
Collaborator

Adds an agent-first runbook to .agent/runbooks/ for driving a flatKV-vs-memIAVL storage-engine correctness validation on harbor at scale (50+ shards), plus its README index row.

What it captures

The method a cold Claude session (or operator) needs to run the validation end-to-end, with the load-bearing correctness traps front-loaded:

  • Compare the two replay nodes to each other, never to the archive — comparing a re-executing shadow to the archive's stored pre-v6.5 results conflates the storage engine with version drift and manufactures false divergences. The archive is the block source only.
  • Verify the flatKV node's migration is complete before trusting any result (sei_chain_seidb_migration_version == target) — migrate_evm is a boundary-split router that serves un-migrated EVM reads from memIAVL, so a premature comparison is silently vacuous (memIAVL-vs-memIAVL). Free at EVM genesis; high-height shards need evm_migrated/flatkv_only/forced completion.
  • historical_replay build tag for pre-v6.5 non-canonical tx bodies (else the strict decoder skips their execution).

Then: the replay-pair topology (same binary + snapshot, blocks from a shared full-history archive), standing up a pair, the seictl result-export shadow comparator (L1 + L2), result aggregation, the Notion report, the 50+ shard fan-out, and a failure-modes table.

Provenance

Distilled from an end-to-end run of this validation on harbor eng-fromtherain. Every technical claim was accuracy-reviewed against the controller + sei-db + seictl source and the live cluster (a systems-engineer on storage/comparator/metrics, a kubernetes-specialist on the CRD/GitOps surface), and legibility-reviewed for agent-first execution (a prose-steward) before this PR.

🤖 Generated with Claude Code

Agent-first runbook for driving a flatKV-vs-memIAVL storage-engine
correctness validation on harbor at scale: the replay-pair topology
(same binary + snapshot, blocks from a shared archive), the load-bearing
correctness gates (compare pair-not-archive; verify migration complete;
historical_replay build), the seictl shadow comparator, result
aggregation + Notion report, and the 50+ shard fan-out.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Documentation-only change under .agent/runbooks/; no controller, CRD, or runtime behavior is modified.

Overview
Adds .agent/runbooks/validating-flatkv-memiavl-parity-via-sharded-replay.md, an agent-first operator runbook for flatKV vs memIAVL correctness validation on harbor, and a matching row in .agent/runbooks/README.md.

The new doc encodes the differential replay method (flatKV + memIAVL SeiNode pairs, shared archive, same image/snapshot) and front-loads traps that make naïve runs look green while measuring nothing: compare replay nodes to each other, not the archive; gate on sei_chain_seidb_migration_version before trusting flatKV reads; use mock_chain_validation + historical_replay builds (with the documented sei-chain PR dependency). It also covers SeiNode overrides, Flux/harbor-dev rollout, seictl result-export comparator params, S3 aggregation, Notion reporting, 50+ shard fan-out, and a failure-modes table—with L1 as the flatKV verdict and L2 explicitly as same-history sanity only.

Reviewed by Cursor Bugbot for commit b34fdbf. Bugbot is set up for automated code reviews on this repo. Configure here.

bdchatham and others added 3 commits July 1, 2026 14:16
… dependency, §7 invariant-first

- §3 + §1 trap3: state honestly that historical_replay is not on sei-chain main
  (lands via PR #3691); build from that branch ref, not main; correct the
  decoder-symbol framing (main has only DefaultTxDecoderWithoutBodyBloatRejection,
  evmrpc-trace only). Convergent DISSENT (systems-engineer + prose-steward).
- §7: lead with the canonical invariant (canonical* must hold the compared
  heights; resolve behind/ahead at submit time). prose-steward DISSENT.
- §0: add <you>, <archive>, <task-uuid> placeholders.
- cosmetic (kubernetes-specialist RATIFY nits): CEL vs planner for the snapshot
  requirement; finalizer-driven PVC delete; sei.io/node selector clarification;
  block-level layer2 indeterminate wording; §6 step-N cross-refs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ical reads, not flatKV)

Dissenter (sei-network-specialist) caught a correctness-grade mislead the
other lenses missed, verified against sei-chain main 1c66d878:
CacheMultiStoreWithVersion serves ALL historical IAVL reads from the State
Store (pebbledb) when SS is enabled — the SC layer, where write_mode picks
flatKV vs memIAVL, is bypassed. So L2 (historical eth_getStorageAt) compares
SS↔SS on both nodes and is vacuous for flatKV.

- §1: L1 (execution results) is the flatKV verdict; L2 does NOT exercise
  flatKV on SS-enabled nodes (same-history sanity check only); flatKV read
  path needs SS-off + latest-height, committed root is the seidb-digest track.
- §2: scope the migration-complete gate to the latest-version SC path; it
  cannot make L2 meaningful while SS is on.
- §8/§9: L1 is the reported verdict; L2 reported as sanity check, not parity.
- §7/§11: layer indeterminate wording; boundary-block = parent-validators
  lookup; new failure-mode row for the L2-SS-vacuous door.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ss, L2-determinism rationale

Round 3 unanimous RATIFY (k8s, systems-engineer, prose-steward, sei-network
dissenter). Style/advisory polish:
- §1: expand SC -> 'SC (State Commit)' at first use.
- §7: move the 'same-history sanity check, not a flatKV signal' gloss onto
  layer2 (was misattached to the indeterminate flag).
- §4: note L2-determinism still earns its keep as the SS history-agreement check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham merged commit 56eea3b into main Jul 1, 2026
5 checks passed
@bdchatham bdchatham deleted the feat/runbook-flatkv-parity-replay branch July 1, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant