Skip to content

Reserve buffer capacity before every encode_into write#3103

Merged
dpkp merged 1 commit into
masterfrom
dpkp/protocol-encode-buf-ensure
Jun 24, 2026
Merged

Reserve buffer capacity before every encode_into write#3103
dpkp merged 1 commit into
masterfrom
dpkp/protocol-encode-buf-ensure

Conversation

@dpkp

@dpkp dpkp commented Jun 24, 2026

Copy link
Copy Markdown
Owner

The optimized encode path writes primitives directly into a growable
EncodeBuffer (initial 64KB). Only Bytes reserved space via out.ensure();
every other writer assumed room existed. When a payload forces ensure()
to grow the buffer to exactly fit (zero trailing slack, i.e. when
pos + payload > 2 * capacity), the next write ran off the end:

  • UnsignedVarInt32 / tagged-field count -> IndexError (the reported bug)
  • fixed fields (e.g. the next partition's int32 index) -> struct.error
  • arrays of fixed primitives ([]int32) overflow at 64KB with no Bytes
    or tagged fields involved at all
  • codegen kept a stale buf local after a delegated tagged-field
    encode reallocated out.buf (silent data loss / crash); adding
    ensure() to the varint made this fire routinely

Make capacity reservation uniform:

  • runtime codecs (types.py, tagged_fields.py) call out.ensure(N) before
    writing N bytes (fixed/uuid/bitfield/varint, String prefix+payload,
    Bytes length prefix, tagged size byte)
  • codegen gains CodegenContext.emit_reserve(): an inline
    if pos + N > _cap check that only syncs/grows/re-reads on overflow,
    keeping buf/pos/_cap in sync
  • ArrayField bulk-reserves fixed-element arrays once, then writes the
    run with a tight pack_into loop (keeps the hot path fast)
  • struct.py / struct_array.py re-bind buf/_cap after every delegated
    encode_into that can reallocate the buffer

Document the EncodeBuffer capacity contract and the reallocation hazard
(callers must re-read out.buf after a grow).

Adds test/protocol/schemas/test_encode_large.py: boundary unit tests for
every codec plus byte-for-byte parity and round-trip for large records
(Produce v9/v3, single and multi partition), large []int32 arrays, the
64KB/128KB grow boundaries, and pooled-buffer reuse. 22 of these fail on
the prior tree.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

@dpkp

dpkp commented Jun 24, 2026

Copy link
Copy Markdown
Owner Author

Based on original report / PR #3101 (Fixes #3102)

The optimized encode path writes primitives directly into a growable
EncodeBuffer (initial 64KB). Only Bytes reserved space via out.ensure();
every other writer assumed room existed. When a payload forces ensure()
to grow the buffer to exactly fit (zero trailing slack, i.e. when
pos + payload > 2 * capacity), the next write ran off the end:

  - UnsignedVarInt32 / tagged-field count -> IndexError (the reported bug)
  - fixed fields (e.g. the next partition's int32 index) -> struct.error
  - arrays of fixed primitives ([]int32) overflow at 64KB with no Bytes
    or tagged fields involved at all
  - codegen kept a stale `buf` local after a delegated tagged-field
    encode reallocated out.buf (silent data loss / crash); adding
    ensure() to the varint made this fire routinely

Make capacity reservation uniform:

  - runtime codecs (types.py, tagged_fields.py) call out.ensure(N) before
    writing N bytes (fixed/uuid/bitfield/varint, String prefix+payload,
    Bytes length prefix, tagged size byte)
  - codegen gains CodegenContext.emit_reserve(): an inline
    `if pos + N > _cap` check that only syncs/grows/re-reads on overflow,
    keeping buf/pos/_cap in sync
  - ArrayField bulk-reserves fixed-element arrays once, then writes the
    run with a tight pack_into loop (keeps the hot path fast)
  - struct.py / struct_array.py re-bind buf/_cap after every delegated
    encode_into that can reallocate the buffer

Document the EncodeBuffer capacity contract and the reallocation hazard
(callers must re-read out.buf after a grow).

Adds test/protocol/schemas/test_encode_large.py: boundary unit tests for
every codec plus byte-for-byte parity and round-trip for large records
(Produce v9/v3, single and multi partition), large []int32 arrays, the
64KB/128KB grow boundaries, and pooled-buffer reuse. 22 of these fail on
the prior tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dpkp dpkp force-pushed the dpkp/protocol-encode-buf-ensure branch from bdd31f4 to 3b7462f Compare June 24, 2026 14:19
@dpkp dpkp merged commit c4697f4 into master Jun 24, 2026
19 checks passed
@dpkp dpkp deleted the dpkp/protocol-encode-buf-ensure branch June 24, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant