Skip to content

feat(replay): historical_replay build tag for lenient tx decoder#3691

Open
bdchatham wants to merge 4 commits into
mainfrom
feat/plt782-historical-replay-decoder
Open

feat(replay): historical_replay build tag for lenient tx decoder#3691
bdchatham wants to merge 4 commits into
mainfrom
feat/plt782-historical-replay-decoder

Conversation

@bdchatham

@bdchatham bdchatham commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Problem

Replaying pre-v6.5 pacific-1 history on a current build records historical transactions with non-canonical protobuf bodies as failed (code 2, tx decode error: tx parse error) instead of executing them. Empirically: at height 79,205,095 tx 6, a full-history v6.5.1 archive executes the tx (code 0), while a replayer on a newer build rejects it. The tx isn't dropped (block tx-counts match), but it gets a failure result with no state mutation, so replayed state diverges from history and compounds downstream.

Root cause: the strict decoder is hardcoded — NewTxConfigWithHandlerDefaultTxDecoder(protoCodec)defaultTxDecoder(cdc, /*rejectBodyBloat=*/true). rejectBloatedBody() re-marshals the TxBody and rejects on any byte-length delta (a post-v6.5 ADR-027 strictness). Pre-v6.5 blocks were committed by a more lenient binary, so their bodies carry non-canonical protobuf padding the newer decoder rejects. There is no runtime/config/build-tag override today; every app.New funnels through MakeEncodingConfig(), which feeds both the BaseApp decoder and app.txDecoder. The lenient DefaultTxDecoderWithoutBodyBloatRejection already exists but is wired only into evmrpc trace.

Change

A historical_replay build tag that swaps only the block-execution encoding config to the lenient decoder:

  • sei-cosmos/x/auth/tx/config.go — new NewTxConfigWithoutBodyBloatRejection(...) wiring the existing DefaultTxDecoderWithoutBodyBloatRejection (no bool added to the strict constructor — strict stays the only accidentally-reachable default; config.decoder is unexported so a named constructor is required).
  • app/params/proto.go (gated !test_amino && !historical_replay) + new app/params/proto_historical_replay.go (gated !test_amino && historical_replay) — a //go:build split of MakeEncodingConfig and MakeLegacyEncodingConfig. This single seam covers both the BaseApp decoder and app.txDecoder.

Consensus safety

Build tag, not a runtime flag — deliberately. A runtime flag would leave one binary that could be misconfigured to accept non-canonical bodies on live CheckTx/DeliverTx (a lenient validator accepts txs its strict peers reject → fork risk). The lenient decoder is reachable only in a historical_replay-tagged artifact, intended purely for historical replay / analysis — never a validator or mempool-serving node. The default/untagged build is byte-for-byte unchanged.

Verification

  • Default and -tags historical_replay builds both compile (app/params, sei-cosmos/x/auth/tx, app); combination with mock_chain_validation also builds.
  • Without the tag, nothing in the execution path references the lenient decoder.

🤖 Generated with Claude Code

Add a build-tag-gated lenient tx-decoder path so a historical_replay-tagged
build can replay pre-v6.5 blocks whose protobuf tx bodies are non-canonical.
The strict rejectBodyBloat decoder otherwise rejects them (code 2 "tx parse
error"), which skips execution and diverges replayed state from history.

The default (untagged) build is unchanged: the strict decoder stays on all
mempool/CheckTx/DeliverTx paths (a lenient validator would accept txs strict
peers reject — a consensus hazard), so the lenient decoder is reachable ONLY
via the build tag, never at runtime.

- sei-cosmos/x/auth/tx/config.go: NewTxConfigWithoutBodyBloatRejection wiring
  the existing DefaultTxDecoderWithoutBodyBloatRejection.
- app/params: //go:build split of MakeEncodingConfig + MakeLegacyEncodingConfig
  (proto.go gated !historical_replay; proto_historical_replay.go the lenient
  variant). This single seam covers both the BaseApp decoder and app.txDecoder.

Refs PLT-782.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJul 1, 2026, 10:13 PM

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.34%. Comparing base (1c66d87) to head (ce1c84f).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3691      +/-   ##
==========================================
- Coverage   59.29%   58.34%   -0.95%     
==========================================
  Files        2272     2185      -87     
  Lines      188210   178423    -9787     
==========================================
- Hits       111594   104104    -7490     
+ Misses      66565    65071    -1494     
+ Partials    10051     9248     -803     
Flag Coverage Δ
sei-chain-pr 62.94% <ø> (+6.30%) ⬆️
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
app/params/proto.go 100.00% <ø> (ø)

... and 113 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bdchatham bdchatham requested a review from masih July 1, 2026 21:11
@bdchatham bdchatham marked this pull request as ready for review July 1, 2026 21:11
@bdchatham bdchatham requested a review from blindchaser July 1, 2026 21:11
@bdchatham bdchatham assigned amir-deris and unassigned amir-deris Jul 1, 2026
@bdchatham bdchatham requested a review from amir-deris July 1, 2026 21:11
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Introduces a consensus-unsafe lenient tx decoder on the execution encoding path, but only in historical_replay-tagged artifacts; untagged production behavior stays strict.

Overview
Adds a historical_replay Go build tag so replay/analysis binaries can decode and execute txs with non-canonical protobuf TxBody bytes that strict post-v6.5 decoding rejects, fixing state divergence when replaying pre-v6.5 history.

Default builds are unchanged: app/params/proto.go is now gated with !historical_replay and still wires MakeEncodingConfig / MakeLegacyEncodingConfig through the strict tx.NewTxConfig. A new proto_historical_replay.go (same package, historical_replay tag) implements those functions using tx.NewTxConfigWithoutBodyBloatRejection, swapping only the block-execution decoder seam that feeds BaseApp and app.txDecoder.

Lenient API is compile-time isolated: sei-cosmos/x/auth/tx/config_historical_replay.go defines NewTxConfigWithoutBodyBloatRejection, wiring the existing DefaultTxDecoderWithoutBodyBloatRejection. That symbol is only linked when building with -tags historical_replay, so production/mempool/CheckTx/DeliverTx paths in untagged binaries cannot reach it.

Reviewed by Cursor Bugbot for commit ce1c84f. Bugbot is set up for automated code reviews on this repo. Configure here.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small, well-scoped change that adds a historical_replay build tag swapping only the block-execution encoding config to the already-existing lenient (no body-bloat rejection) tx decoder, so tagged builds can replay pre-v6.5 pacific-1 history. The design is sound — a build tag rather than a runtime flag correctly avoids shipping a live binary that could be misconfigured into consensus-unsafe leniency — and the untagged build is unchanged. No blockers; a few minor non-blocking notes.

Findings: 0 blocking | 5 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • The second-opinion passes were largely empty: REVIEW_GUIDELINES.md is empty (no repo-specific standards applied), and cursor-review.md is empty (Cursor produced no output). Only Codex produced a finding.
  • No direct automated coverage for the new tagged seam. TestDefaultTxDecoderWithoutBodyBloatRejection covers the underlying decoder, but neither NewTxConfigWithoutBodyBloatRejection nor the MakeEncodingConfig/MakeLegacyEncodingConfig build-tag split is exercised by a test that asserts a bloated-body tx decodes under the tag and is rejected without it. Build-tagged code is awkward to test, but a //go:build historical_replay test asserting the tagged EncodingConfig accepts a non-canonical body would lock in the intended behavior and guard against future regressions.
  • Consider documenting the historical_replay tag in the replay/tooling docs (or a Makefile target) so operators know it must never be used for validator/mempool-serving nodes, matching the in-code warnings.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread sei-cosmos/x/auth/tx/config.go Outdated
Comment thread app/params/proto_historical_replay.go Outdated
…ase refs)

Describe what the historical_replay lenient decoder is FOR — replaying
historical blocks whose tx bodies predate strict body-bloat rejection —
rather than naming a specific release/chain. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well-scoped, well-documented change that adds a historical_replay build tag swapping only the block-execution encoding config to the existing lenient tx decoder, leaving the default/untagged build byte-for-byte unchanged. No blocking issues; a couple of minor defensive notes.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Cursor's second-opinion review (cursor-review.md) was empty — no findings from that pass. Codex flagged the exported constructor (addressed inline below).
  • No automated test accompanies the change. It is config wiring behind a build tag and hard to unit-test, but a small test that asserts a -tags historical_replay build's MakeEncodingConfig().TxConfig decodes a known non-canonical/body-bloated tx body that the default build rejects would lock in the intended behavior and guard against future regressions in the decoder wiring.
  • Edge case: with both test_amino and historical_replay set, amino.go (gated only on test_amino) wins and the lenient decoder is silently not used. This combination is nonsensical for replay, but worth being aware of; no change strictly required.
  • The doc comments on the new MakeEncodingConfig/MakeLegacyEncodingConfig ("...for an amino based test configuration") are inaccurate copy-paste boilerplate inherited from proto.go — these are neither amino-based nor test-only. Pre-existing nit; could be cleaned up while here.
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread sei-cosmos/x/auth/tx/config.go Outdated
…doc comments

seidroid[bot] review on #3691:
- [suggestion] NewTxConfigWithoutBodyBloatRejection was compiled into untagged
  builds, weakening the 'lenient decoder reachable only from historical_replay
  artifacts' invariant. Move it into a //go:build historical_replay file
  (config_historical_replay.go) so an untagged/production binary never compiles
  it — a future caller cannot accidentally wire it into a live path. Default and
  tagged builds both verified.
- [nit] Replace the stale copy-pasted 'amino based test configuration' doc
  comments on MakeEncodingConfig / MakeLegacyEncodingConfig with the lenient/replay purpose.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A well-scoped, consensus-safe change that adds a historical_replay build tag to swap the block-execution tx decoder to the existing lenient (no body-bloat rejection) variant for replaying pre-v6.5 history; the code is correct and the untagged/production build is unchanged. Only minor non-blocking notes.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Cursor's second-opinion review file was empty (no output produced); Codex's review reported "no material findings" but noted it could not run go test (sandbox had Go 1.24.13 vs required 1.25.6, no network). Consider confirming both tagged and untagged builds compile in CI.
  • No automated test accompanies the change. A build-tag-gated smoke test (e.g. a historical_replay-tagged test asserting MakeEncodingConfig().TxConfig.TxDecoder() decodes a non-canonical/body-bloated tx body that the default decoder rejects) would guard against future regressions in the decoder wiring, though this is inherently hard to unit-test.
  • The new proto_historical_replay.go doc comments are more accurate than the original proto.go comments ("for an amino based test configuration"), which are now stale/misleading for both variants — a minor cleanup opportunity in the base file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants