Skip to content

feat(evmrpc): bound debug_trace struct-logger memory via max_trace_struct_log_bytes (PLT-205)#3677

Open
amir-deris wants to merge 7 commits into
mainfrom
amir/plt-205-debug-trace-memory-growth
Open

feat(evmrpc): bound debug_trace struct-logger memory via max_trace_struct_log_bytes (PLT-205)#3677
amir-deris wants to merge 7 commits into
mainfrom
amir/plt-205-debug-trace-memory-growth

Conversation

@amir-deris

@amir-deris amir-deris commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

debug_trace* calls that use the default struct logger retain one structured log entry per executed opcode. Traces that touch many distinct storage slots grow memory quadratically, and an unbounded trace can OOM the RPC node. Upstream geth exposes a per-call Limit on the struct logger but leaves it unset (unlimited) by default.

This PR adds an operator-configurable, node-side cap on the retained default struct-logger output so a single trace cannot grow without bound.

Changes

  • New config evm.max_trace_struct_log_bytes (MaxTraceStructLogBytes) bounding the bytes of retained struct-logger output for a single default debug_trace* call. Defaults to 32 MiB; 0 disables the cap (matches upstream geth's unlimited behavior).
  • DebugAPI.clampDefaultStructLogLimit applies the cap across TraceTransaction, TraceBlockByNumber, TraceBlockByHash, and TraceCall. It:
    • is a no-op for custom tracers (only the default struct logger is affected),
    • is a no-op when the cap is disabled (0),
    • clamps a larger caller-supplied Limit down to the node cap,
    • honors a smaller caller-supplied Limit.
  • Wired into config parsing (ReadConfig), the TOML template, and DefaultConfig.

Tests

  • TestClampDefaultStructLogLimit covers all clamp/no-op branches (no limit set, nil nested config, larger limit clamped, smaller limit honored, custom tracer, disabled cap, nil config).
  • config_test.go asserts the shipped default (DefaultConfig.MaxTraceStructLogBytes = 32 MiB), verifies an explicitly-supplied value round-trips through ReadConfig, and rejects a bad type for the new flag.

PLT-205

@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes default trace behavior (32 MiB cap) and memory use on RPC nodes; custom tracers unchanged and operators can set 0 to restore unlimited geth behavior.

Overview
Adds operator setting evm.max_trace_struct_log_bytes (default 32 MiB, 0 = unlimited like geth) to cap retained default struct-logger output per traced transaction, reducing OOM risk from heavy debug_trace* calls.

DebugAPI.clampDefaultStructLogLimit runs on TraceTransaction, block trace, TraceCall, and TraceTransactionProfile before tracing: applies the cap when no custom tracer is used, clamps a higher RPC Limit down, keeps a lower client Limit, and skips work when the cap is disabled. Config is wired through ReadConfig, TOML template, and DefaultConfig, with clampUint64ToInt so huge uint64 values do not wrap and disable the cap.

Tests cover clamp branches and config round-trip/default; profiledTraceTx documents that callers must clamp at the API boundary (block paths clamp before the profiled path).

Reviewed by Cursor Bugbot for commit f5246ee. Bugbot is set up for automated code reviews on this repo. Configure here.

@amir-deris amir-deris changed the title Added config for trace limit and added test feat(evmrpc): bound debug_trace struct-logger memory via max_trace_struct_log_bytes (PLT-205) Jun 30, 2026
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJul 2, 2026, 1:19 AM

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.33333% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.28%. Comparing base (1c66d87) to head (f5246ee).

Files with missing lines Patch % Lines
evmrpc/tracers.go 75.00% 3 Missing and 3 partials ⚠️
evmrpc/trace_profile.go 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3677      +/-   ##
==========================================
- Coverage   59.29%   58.28%   -1.01%     
==========================================
  Files        2272     2185      -87     
  Lines      188210   178201   -10009     
==========================================
- Hits       111594   103864    -7730     
+ Misses      66565    65105    -1460     
+ Partials    10051     9232     -819     
Flag Coverage Δ
sei-chain-pr 69.06% <73.33%> (+12.41%) ⬆️
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
evmrpc/block_trace_profiled.go 15.59% <ø> (ø)
evmrpc/config/config.go 71.42% <100.00%> (+0.59%) ⬆️
evmrpc/trace_profile.go 67.41% <33.33%> (-1.19%) ⬇️
evmrpc/tracers.go 69.32% <75.00%> (+0.43%) ⬆️

... and 127 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clean, well-tested, well-documented change that adds an operator cap on default struct-logger trace output. However, Codex flags a likely-correctness issue that defeats the feature's purpose: geth's logger.Config.Limit may count retained struct-log entries, not bytes, which would make the byte-named MaxTraceStructLogBytes config and its 256 MiB default ineffective — this must be verified against the pinned geth version before merge.

Findings: 2 blocking | 4 non-blocking | 1 posted inline

Blockers

  • Verify the units of geth's eth/tracers/logger.Config.Limit against the pinned dependency (sei-protocol/go-ethereum v1.15.7-sei-17). Codex reports (and historically geth implemented) Limit as a count of retained StructLog entries (len(l.logs) >= cfg.Limit), not bytes. If that holds at this version, the config name MaxTraceStructLogBytes, the doc comments ("in bytes"), and the 256 MiB default are misleading and provide essentially no memory protection (256M entries × hundreds of bytes each = tens of GB), so the PR would not accomplish its stated goal. If verified that this geth version bounds by output bytes (resultSize/logs.Size()), the PR is correct and this resolves to a no-op. I could not verify this in-sandbox (module cache, gh api, and web access were all blocked), so this needs author confirmation. If it is entry-count, fix by either renaming the knob to reflect entries and choosing a sane entry-count default, or by enforcing a true byte cap.
  • 1 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • Behavior change: the default debug_trace* path was previously unlimited and now truncates at the cap. When the struct logger hits its Limit it silently stops appending entries — the trace result is returned incomplete with no error/flag. Tools consuming full opcode traces may now get partial results without indication. Worth a CHANGELOG/release note and possibly surfacing the truncation in the response.
  • Consider an integration-level test that actually exercises a trace exceeding the cap and asserts the output is bounded/truncated end-to-end; current tests only unit-test the clampDefaultStructLogLimit branch logic (which is good but doesn't prove the limit takes effect in a real trace).
  • Cursor's review file (cursor-review.md) was empty — that pass produced no output. Codex contributed the single (High) finding noted above.
  • Nit: int(debugCfg.MaxTraceStructLogBytes) narrows a uint64 to int. On 64-bit nodes (the only supported target) this is fine and the //nolint:gosec is reasonable, but an operator-set value above 2^31 on a 32-bit build would overflow to a negative int and disable the cap. A clamp to math.MaxInt (or documenting 64-bit-only) would be more defensive.

Comment thread evmrpc/config/config.go Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small operational guardrail on debug trace memory with conservative defaults and clear test coverage.

Extended reasoning...

Overview

Adds MaxTraceStructLogBytes (default 256 MiB, 0=unlimited) to evmrpc/config and wires it into DebugAPI via a new clampDefaultStructLogLimit helper invoked by TraceTransaction, TraceBlockByNumber, TraceBlockByHash, and TraceCall. The clamp only affects the default geth struct logger; named/custom tracers are skipped, and the cap only ever tightens — never loosens — a caller-supplied Limit.

Security risks

None of note. This is a memory bound that tightens an existing behavior; it does not touch auth, crypto, consensus, or any external surface. Worst-case it returns a smaller trace than a caller requested, which is the documented intent. Defaults preserve upstream geth behavior when set to 0.

Level of scrutiny

Low. This is operator-facing tunable behind a config flag, with the default selected to be conservative (256 MiB). Verified Config.Limit semantics in the vendored geth (eth/tracers/logger/logger.go:49: "maximum size of output, but zero means unlimited") match the PR — so the "honor smaller caller limit, override 0/larger" logic is correct.

Other factors

Test coverage in tracers_structlog_limit_test.go hits the edges that matter (nil config, nil nested Config, larger caller limit, smaller caller limit, custom-tracer no-op, disabled-cap no-op, nil-safe). Config plumbing follows the established pattern in the file (flag constant, default value, ReadConfig branch, app.toml template snippet, bad-type test). No outstanding reviewer comments to address.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config plumbing, clamp logic, and tests are clean and well-structured, but the feature does not achieve its stated goal: go-ethereum's logger.Config.Limit bounds the number of retained struct-log entries (opcodes), not bytes, so treating it as a byte count means the advertised 256 MiB memory cap is not actually enforced and the OOM this PR targets can still occur.

Findings: 2 blocking | 3 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.
  • 2 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • int(debugCfg.MaxTraceStructLogBytes) at tracers.go:208 (with //nolint:gosec): a very large operator-configured uint64 (> math.MaxInt, or > MaxInt32 on 32-bit builds) wraps to a negative int, which clampDefaultStructLogLimit then treats as <= 0 and silently disables the cap — the opposite of the operator's intent. Consider clamping to math.MaxInt instead of a raw conversion.
  • Cursor second-opinion review file (cursor-review.md) was empty — no findings were contributed from that pass.
  • Codex's single finding (Limit is an entry count, not bytes) is corroborated here and captured as the primary inline blocker.

Comment thread evmrpc/tracers.go Outdated
config.Config = &traceLogger.Config{}
}
if config.Config.Limit <= 0 || config.Config.Limit > api.maxStructLogBytes {
config.Config.Limit = api.maxStructLogBytes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] This assigns a byte budget to logger.Config.Limit, but in go-ethereum that field caps the number of retained struct-log entries (one per executed opcode), not bytes — the struct logger compares len(l.logs) against Limit. With the 256 MiB default this becomes Limit = 268,435,456 entries. Since each struct-log entry (PC, op, gas, and optionally stack/memory/storage snapshots) is far larger than one byte, a node would OOM long before accumulating 268M entries, so the intended memory bound is effectively never enforced and the OOM this PR targets can still happen. The cap needs to be expressed in entries with a realistic default (order 10^4–10^6, tuned to the desired memory ceiling), or memory must be bounded some other way — the byte framing does not map onto Limit.

Comment thread evmrpc/config/config.go Outdated
Comment thread evmrpc/tracers.go

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds an operator-configurable cap on default struct-logger output for debug_trace* calls, wired cleanly with good test coverage. However, the value is applied to geth's logger.Config.Limit, which bounds the NUMBER of struct-log entries rather than bytes, so the byte-denominated config name/docs and the 256 MiB default do not deliver the memory bound they advertise.

Findings: 1 blocking | 5 non-blocking | 3 posted inline

Blockers

  • None at the file/PR level.
  • 1 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • cursor-review.md was empty — the Cursor second-opinion pass produced no output. REVIEW_GUIDELINES.md (from the base branch) was also empty, so no repo-specific review standards were applied.
  • Positive: TestClampDefaultStructLogLimit covers all clamp/no-op branches (no limit, nil nested config, larger limit clamped, smaller limit honored, custom tracer, disabled cap, nil config), and the clamp is correctly applied before the profiled-vs-standard block-trace branch so both paths are covered.
  • Consider a test that actually runs a default struct-logger trace end-to-end and asserts the effective cap is enforced during execution — the current unit tests only exercise config mutation, not the geth logger's runtime behavior, which is exactly where the entries-vs-bytes mismatch hides.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread evmrpc/tracers.go
Comment thread evmrpc/config/config.go
Comment thread evmrpc/tracers.go Outdated
traceCallSemaphore: sem,
maxBlockLookback: debugCfg.MaxTraceLookbackBlocks,
traceTimeout: debugCfg.TraceTimeout,
maxStructLogBytes: int(debugCfg.MaxTraceStructLogBytes), //nolint:gosec // bounded operator config

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] int(debugCfg.MaxTraceStructLogBytes) on a uint64: a configured value above math.MaxInt64 wraps to a negative int, which clampDefaultStructLogLimit then treats as <= 0 ("disabled"), silently turning off the cap instead of rejecting the invalid config. Consider validating/clamping the range at parse time in ReadConfig rather than relying on the //nolint:gosec cast. Low likelihood in practice, but it's a silent failure mode.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a1c065d. Configure here.

Comment thread evmrpc/tracers.go
Comment thread evmrpc/tracers.go

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR wires a new operator config into geth's struct-logger Limit, but that field bounds the count of retained struct-log entries, not bytes — so the config named/documented as max_trace_struct_log_bytes ("256 MiB") is semantically incorrect and its default (~268M entries) does not enforce the memory cap it advertises. Plumbing, tests, and clamp logic are otherwise sound.

Findings: 4 blocking | 3 non-blocking | 2 posted inline

Blockers

  • Feature does not achieve its stated goal. geth's logger.Config.Limit gates retention by entry count (l.cfg.Limit <= len(l.logs)), not bytes. Mapping max_trace_struct_log_bytes onto Limit means the default of 25610241024 = 268,435,456 is interpreted as ~268M retained struct-log entries — effectively unlimited for the OOM scenario described. The quadratic memory growth (each entry re-copying the accumulated storage map) blows up long before that entry count, so the default provides no meaningful protection. Either bound by actual retained bytes, or rename the config/field/docs (e.g. max_trace_struct_log_entries) and choose an entry-count default that reflects real memory limits.
  • The docstrings, TOML template comment, PR description, and default value all state the cap is in bytes / '256 MiB'. This misdocumentation can cause operators to badly misconfigure: setting max_trace_struct_log_bytes = 1048576 expecting a 1 MiB cap would actually retain up to 1,048,576 entries (potentially many GiB), i.e. under-protection while believing they are protected.
  • 2 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • cursor-review.md was empty — the Cursor second-opinion pass produced no output.
  • Codex's single finding (tracers.go bytes-vs-count) matches the primary blocker here and is corroborated.
  • int(debugCfg.MaxTraceStructLogBytes) is silenced with //nolint:gosec; on 32-bit platforms a configured value > 2^31-1 would overflow/negative-wrap. Low risk given operator-controlled config and the default fitting in int32, but worth clamping to math.MaxInt defensively.

Comment thread evmrpc/tracers.go
Comment thread evmrpc/config/config.go
@amir-deris

Copy link
Copy Markdown
Contributor Author

Re: AI review findings — verified against the pinned dependency

I checked the flagged issues against the actual vendored source (sei-protocol/go-ethereum v1.15.7-sei-17).

False positives (the two blocking findings):

  1. "Config.Limit counts entries, not bytes" (Codex / seidroid, ×several threads) — not true at this version. The struct logger bounds on accumulated output bytes: logger.go:49 (Limit int // maximum size of output), logger.go:267 (l.resultSize > l.cfg.Limit), logger.go:318-319 (resultSize += len(entry)). The len(l.logs) >= Limit entry-count behavior is historical upstream geth and no longer applies here. The 256 MiB default and the Bytes naming are accurate.

  2. "Clamp writes the wrong field" (Cursor Bugbot) — tracers.TraceConfig embeds *logger.Config (api.go:163-164), so config.Limit is the promoted config.Config.Limit. geth reads that same field via NewStructLogger(config.Config) (api.go:1090) and logConfig = config.Config (api.go:814).

Legitimate, will address:

  • "Profile trace skips struct log cap" (Cursor) — correct. debug_traceTransactionProfileprofiledTraceTx uses the default struct logger without going through clampDefaultStructLogLimit. I'll wire the clamp into that path.
  • uint64→int overflow nit — will clamp to math.MaxInt at parse time.
  • CHANGELOG note for the default-path truncation behavior change.

Comment thread evmrpc/tracers.go
Comment thread evmrpc/config/config.go Outdated

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automated review did not complete; see the failing AI Review check for details.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds an operator-configurable cap intended to bound debug_trace struct-logger memory "in bytes", but it assigns that byte value to geth's StructLogger Config.Limit, which counts struct-log entries (opcodes), not bytes — so the advertised byte cap is a misnomer and the OOM protection is largely ineffective at its stated scale. There is also a default-value discrepancy (code sets 32 MiB, PR/tests reference 256 MiB).

Findings: 2 blocking | 3 non-blocking | 2 posted inline

Blockers

  • geth's logger.Config.Limit is a cap on the number of retained struct-log entries (len(l.logs)), not a byte size (checked as if l.cfg.Limit != 0 && len(l.logs) >= l.cfg.Limit). The entire feature — field MaxTraceStructLogBytes, maxStructLogBytes, config max_trace_struct_log_bytes, and all doc comments — is framed as a byte bound but actually enforces an entry-count bound. A default of 32 MiB therefore permits ~33.5M retained entries per tx (each entry carrying stack/storage/return-data), which is many GB, so the intended byte-scale OOM protection is not achieved. Either rename/redocument this as an entry-count limit and pick a realistic default, or convert bytes→approx-entries before assigning to Limit (and note that Limit truncation happens post-collection in the default StructLogger). (Confirmed against upstream geth struct-logger semantics; worth double-checking the sei fork tag v1.15.7-sei-17 hasn't redefined Limit.)
  • 1 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • The Cursor second-opinion review file (cursor-review.md) was empty — that pass produced no output.
  • Consider a test that exercises an actual trace end-to-end to prove the cap bounds memory as intended, rather than only unit-testing the clamp arithmetic — that would have surfaced the entry-vs-byte semantics gap.
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread evmrpc/tracers.go
Comment thread evmrpc/config/config.go

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds a node-side cap for default-struct-logger traces and wires it cleanly through config, TOML, and all five trace entry points with good test coverage. However, the cap is applied to geth's logger.Config.Limit, which bounds the number of struct-log entries (opcode steps), not bytes — so the _bytes name, the 32 MiB default, and the documented memory guarantee don't hold, and the quadratic-storage OOM the PR targets may not be prevented.

Findings: 2 blocking | 4 non-blocking | 2 posted inline

Blockers

  • Semantic mismatch between the config's stated units (bytes) and geth's Limit (entry count). Either rename/re-document the setting as a struct-log entry cap, or implement real byte accounting (or convert the byte budget to a conservative entry limit) so the operator-facing max_trace_struct_log_bytes and its 32 MiB default actually bound retained memory. As written, the default permits ~33.5M retained entries — orders of magnitude more than 32 MiB — and won't trip in time to stop a quadratic-storage OOM. (Confirm against the sei fork: if sei-protocol/go-ethereum has redefined StructLogger.Limit to be byte-based, this concern is moot — but upstream geth treats it as len(l.logs).)
  • 1 blocking issue(s) flagged inline on specific lines.

Non-blocking

  • Cursor's second-opinion review (cursor-review.md) was empty — that pass produced no output.
  • Coverage check: all five default-struct-logger entry points (TraceTransaction, TraceTransactionProfile, TraceBlockByNumber, TraceBlockByHash, TraceCall) correctly invoke clampDefaultStructLogLimit; TraceStateAccess takes no config and is unaffected. No unclamped entry point was found — good.
  • If the entry-count semantics are kept, consider adding a test that ties the cap to observed memory/entry count end-to-end (the current unit tests validate the clamp arithmetic but bake in the bytes==entries assumption).
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread evmrpc/tracers.go
config.Config = &traceLogger.Config{}
}
if config.Limit <= 0 || config.Limit > api.maxStructLogBytes {
config.Limit = api.maxStructLogBytes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] config.Limit here is geth's logger.Config.Limit, which bounds the number of struct-log entries (OnOpcode returns once len(l.logs) >= Limit), not a byte count. Assigning a byte budget (api.maxStructLogBytes, default 3210241024) to it means the value is interpreted as ~33.5M retained entries, not 32 MiB. Since each entry can carry stack/memory/storage state (and with per-entry storage copies the growth is quadratic), actual retained memory can be far larger than the configured "bytes" and the cap may not trip before an OOM — which is the very scenario this PR targets. Please either implement byte accounting, convert the byte budget into a conservative entry limit, or rename the setting to reflect entry-count semantics. (If the sei fork redefined Limit to be byte-based, disregard — but upstream geth does not.)

Comment thread evmrpc/config/config.go
MaxConcurrentSimulationCalls: runtime.NumCPU(),
MaxTraceLookbackBlocks: 10000,
TraceTimeout: 30 * time.Second,
MaxTraceStructLogBytes: 32 * 1024 * 1024, // 32 MiB

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] The 32 MiB default is assigned to an entry-count Limit (see tracers.go). If the units stay entry-based, this default allows ~33.5M struct-log entries retained per tx — likely gigabytes, not 32 MiB. Pick a default that reflects the real cap once the units question above is resolved, and align the field name/docs accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants