Skip to content

Use ls-hpack's fast Huffman decoder for HPACK/QPACK strings#13259

Open
phongn wants to merge 3 commits into
apache:masterfrom
phongn:hpack-fast-huffman
Open

Use ls-hpack's fast Huffman decoder for HPACK/QPACK strings#13259
phongn wants to merge 3 commits into
apache:masterfrom
phongn:hpack-fast-huffman

Conversation

@phongn

@phongn phongn commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

When the home-grown Huffman codec was replaced with vendored LiteSpeed ls-hpack code (#12357), only the conservative 4-bit FSM decoder (lshpack_dec_huff_decode_full) was ported — although huff-tables.h has carried the 64K-entry table for upstream's fast decoder all along. This PR ports the fast decoder (lshpack_dec_huff_decode, from ls-hpack v2.3.5) and switches huffman_decode() to it.

The fast decoder consumes 16 bits of input per table lookup and emits up to 3 bytes, falling back to the FSM decoder for the rare codes longer than 16 bits. HPACK and QPACK share the wrapper through xpack_decode_string(), so both HTTP/2 and HTTP/3 header decoding benefit. No new memory footprint: the hdecs table has been compiled into the binary since the original vendoring.

Performance

tools/benchmark/benchmark_HuffmanDecode.cc (new; build with -DENABLE_BENCHMARKS=ON), release build, Ice Lake:

Input FSM decoder fast decoder speedup
8B value (text/css) 103 ns 73 ns 1.4x
86B Accept value 1003 ns 505 ns 2.0x
113B User-Agent 1296 ns 659 ns 2.0x
10-value mixed corpus (459B) 5.32 µs 2.80 µs 1.9x

RFC 7541 strictness (deliberate divergence from upstream ls-hpack)

Differential testing revealed that upstream's fast decoder accepts padding of 8–10 bits when it follows the final symbol near the end of the input. RFC 7541 §5.2 requires that "a padding strictly longer than 7 bits MUST be treated as a decoding error", and the FSM decoder has always enforced this. The ported decoder adds a guard in the tail check to keep the strict behavior, so this PR does not loosen what ATS accepts. The divergence is documented in lib/ls-hpack/README.md and pinned by the deterministic decode_overlong_padding test, which fails if a future re-sync drops the guard.

Semantics and compatibility

  • Validity and output are otherwise identical to the FSM decoder. Verified by: exhaustive parity over all 1- and 2-byte inputs, 100k seeded differential fuzz iterations (with out-of-bounds-write sentinel checks), 20k encode/decode roundtrips, and the RFC 7541 vectors. Offline, the port was additionally validated four-ways against upstream's own two decoders over exhaustive 1–3-byte inputs (16.8M cases) plus a 134M-case destination-size sweep: bit-identical to upstream's fast decoder apart from the strictness guard above.
  • The one observable boundary is an exactly-sized destination (dst_len == decoded length), where either decoder may report LSHPACK_ERR_MORE_BUF depending on how trailing padding falls on nibble boundaries. One byte of headroom guarantees success; ATS sizes destinations at 2× the encoded length (strictly larger than any decoded result, since Huffman expansion is at most 8/5), so callers are unaffected. The sizing contract is now documented on huffman_decode().
  • Error returns remain negative-on-failure as callers expect (xpack_decode_string() checks len < 0).

phongn and others added 3 commits June 11, 2026 16:38
The vendored ls-hpack code only ported the conservative 4-bit FSM
decoder (lshpack_dec_huff_decode_full), although the 16-bit decode
table it needs (hdecs) has been in huff-tables.h all along. Port the
fast decoder, which emits up to 3 bytes per table lookup and falls
back to the full decoder for codes longer than 16 bits. Decoding
typical header values is about 2x faster (see the new
tools/benchmark/benchmark_HuffmanDecode.cc).

One deliberate divergence from upstream ls-hpack: the tail check also
rejects padding of 8 or more bits, per RFC 7541 section 5.2, keeping
the strictness of the FSM decoder.

New differential tests assert the two decoders agree on validity and
content for exhaustive short inputs and seeded fuzz. The only
permitted difference is an exactly-sized destination, where the FSM
decoder can spuriously report LSHPACK_ERR_MORE_BUF; ATS callers
always allocate twice the encoded length, so they are unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Document the RFC 7541 padding divergence in the vendored README and
pin it with a deterministic test (the fuzz coverage of that branch
was seed dependent). Document huffman_decode's buffer sizing contract
in the header. Export LSHPACK_ERR_MORE_BUF as upstream does instead
of a test literal. Scope the lib/ include path to the consumers that
need it rather than exporting it from the lshpack target. Consolidate
the per-byte sentinel assertions, cutting the parity tests' assertion
count from 10.7M to 0.4M.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports LiteSpeed ls-hpack’s 16-bit table (“fast”) Huffman decoder into ATS’s vendored lib/ls-hpack and switches ATS’s huffman_decode() wrapper (used by both HPACK and QPACK via xpack_decode_string()) to use it, aiming to reduce header decompression CPU cost while preserving ATS’s stricter RFC 7541 padding validation.

Changes:

  • Add lshpack_dec_huff_decode() (fast decoder) to the vendored ls-hpack code and wire huffman_decode() to use it instead of the 4-bit FSM decoder.
  • Add extensive parity / fuzz / strict-padding regression tests to ensure the fast decoder matches the existing behavior (including ATS’s deliberate stricter padding rule).
  • Add a Catch2 benchmark target to compare full vs fast decoder performance.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tools/benchmark/CMakeLists.txt Adds the new Huffman decode micro-benchmark executable target.
tools/benchmark/benchmark_HuffmanDecode.cc New Catch2 benchmark comparing lshpack_dec_huff_decode_full vs lshpack_dec_huff_decode on representative inputs.
src/proxy/hdrs/unit_tests/test_Huffmancode.cc Adds known-vector decode coverage, exhaustive/parity tests, fuzzing, and strict padding regression coverage for the fast decoder.
src/proxy/hdrs/HuffmanCodec.cc Switches ATS huffman_decode() wrapper to call the fast ls-hpack decoder.
src/proxy/hdrs/CMakeLists.txt Ensures unit tests can include vendored ls-hpack headers (lib/).
lib/ls-hpack/README.md Updates referenced upstream version and documents ATS’s deliberate strict-padding divergence.
lib/ls-hpack/lshpack.h Exposes LSHPACK_ERR_MORE_BUF and declares lshpack_dec_huff_decode().
lib/ls-hpack/lshpack.cc Implements the fast decoder and adds ATS-specific guard rejecting padding ≥ 8 bits.
include/proxy/hdrs/HuffmanCodec.h Documents huffman_decode() sizing expectations (dst must be strictly larger than decoded output).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants