Skip to content

do not merge: onpair dfa#8361

Draft
joseph-isaacs wants to merge 5 commits into
developfrom
claude/relaxed-goodall-e3s5pr
Draft

do not merge: onpair dfa#8361
joseph-isaacs wants to merge 5 commits into
developfrom
claude/relaxed-goodall-e3s5pr

Conversation

@joseph-isaacs

Copy link
Copy Markdown
Contributor

Summary

Closes: #000

Testing

claude added 5 commits June 9, 2026 14:25
Evaluate `prefix%` and `%needle%` LIKE patterns directly on OnPair
compressed code streams, mirroring the FSST DFA pushdown. Each u16 code
is lifted to a byte-level DFA transition (KMP for contains, linear for
prefix) by feeding its dictionary token's bytes through the byte table;
scanning a row's codes is then one table lookup per code and is exactly
equivalent to byte-level matching over the decompressed row.

OnPair has no escape code (the trainer always emits all 256 single-byte
tokens), so the DFA is strictly simpler than FSST's: no escape sentinel
and no escape table. Unsupported pattern shapes (`_`, suffix, ILIKE,
needles beyond the u8 state space) return None and fall back to
decompression.

Wires `LikeExecuteAdaptor(OnPair)` into the parent kernel set. Adds unit
tests plus a randomised cross-check against ground-truth starts_with /
contains over 600 rows and 14 needles.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Add a divan microbenchmark comparing the compressed-domain LIKE pushdown
against the decompress-and-match fallback on a 200k-row OnPair-encoded
URL column. On this corpus the pushdown is ~1.9-2.2x faster for prefix
and ~2.4-3.3x for contains.

Two benchmark-enablement knobs:
- `VORTEX_ONPAIR_LIKE_PUSHDOWN=0` forces the OnPair LikeKernel to decline
  (fall back to decompression), so the same binary can A/B the pushdown
  end-to-end without a rebuild. Read once.
- `CLICKBENCH_PARTITIONS=N` caps how many ClickBench shards are fetched
  and queried, for local/iterative runs (the full suite still defaults to
  100).

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Select the DFA variant once in `OnPairMatcher::scan_to_bitbuf` instead of
re-matching the matcher enum per row through a closure, mark the concrete
`FlatContainsDfa`/`FlatPrefixDfa::matches` `#[inline]`, and walk row
offsets with a running cursor. This lets the row scan monomorphise and
inline the DFA step. Controlled microbench (same machine, back-to-back):
contains pushdown ~1.16-1.26x faster (e.g. %bonprix% 1.84ms -> 1.46ms),
prefix marginally faster.

Also add an instrumented characterization test proving where the pushdown
actually fires through the execution engine: bare OnPair and Dict(OnPair)
both route the predicate to the kernel, but Dict(Shared(OnPair)) -- the
shape a dict-encoded column takes when read back from a multi-chunk file
-- does not, because `Shared` has no parent-reduce forwarding and
canonicalizes (decompresses) instead. This is why the compressed-domain
LIKE pushdown does not move end-to-end ClickBench/TPC-H numbers, and it
affects FSST identically.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
A dict-encoded string column reads back as `Dict(codes, Shared(values))`.
`Shared` (which dedups the decoded dictionary across row splits) has no
parent-reduce forwarding, so a predicate pushed to the values --
`like(Shared(onpair))` -- canonicalizes (decompresses) the source instead
of reaching the OnPair/FSST LIKE kernel. Because the filter path's
`values_array_uncanonical` reused the projection's `Shared`-wrapped cache,
any query that both projects and filters the same column (e.g. ClickBench
Q22's `MIN(URL)` + `WHERE URL LIKE`) silently lost the pushdown.

Give the predicate path its own bare (non-`Shared`) values cache, built on
the same underlying read as the `Shared` projection cache (values are read
once). Projection keeps `Shared` for cross-split decode reuse; predicates
get bare values so the optimizer can push them into the values encoding.

Verified end-to-end on a ClickBench shard (OnPair-encoded `URL`):
- Q22-shape (filter + project URL): kernel firings 0 -> 44, query faster.
- count(*) filter: still 44 firings, result unchanged.
- Q34 (GROUP BY URL, pure decode): unchanged (no decode-cache regression).

Also retarget the OnPair characterization test's comment at this layout
fix (the array-level `Shared`-blocks-pushdown behavior it pins is what
motivates applying predicates to bare values).

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
The per-call DFA table was the dominant cost of the LIKE pushdown on
dict-encoded columns (~17% of ClickBench Q21 in a samply profile): it
built an `n_states x n_codes` transition for every one of the (up to 4096)
dictionary tokens, even though the needle/prefix can only interact with
the tokens that contain one of its bytes.

A token whose bytes are all absent from the pattern drives the byte table
to the same reset state from every *live* state (a non-needle byte falls
back to 0 via KMP from any non-accept state; a non-prefix byte fails), and
the accept/fail rows are never read because the scan returns the instant it
reaches them. So such a token's whole column is just the skip value.

Pre-fill the table with the skip value and only compute columns for codes
containing a pattern byte; for those, read the token once while advancing
all `n_states` start states in lockstep (a per-byte gather). Build-heavy
microbench (build + 4k-row scan): ~1.3-1.6x faster, more for rare-byte
needles (most tokens skipped), less for common-byte needles like
`%google%` on URLs. Randomized ground-truth fuzz test still passes.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs changed the title Claude/relaxed goodall e3s5pr do not merge: onpair dfa Jun 11, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1530 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bitwise_not_vortex_buffer_mut[128] 215.3 ns 244.4 ns -11.93%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 352.4 µs 299.7 µs +17.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/relaxed-goodall-e3s5pr (5257888) with develop (0dd6db7)

Open in CodSpeed

@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026
@github-actions github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Polar Signals Profiling Results

Latest Run

Status Commit Job Attempt Link
🟢 Done 5257888 1 Explore Profiling Data

Powered by Polar Signals Cloud

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark PolarSignals Profiling failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb NVMe failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on NVME failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-DS SF=1 on NVME failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb S3 failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Statistical and Population Genetics failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on NVME failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Clickbench on NVME failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

BENCHMARK FAILED

Benchmark Random Access failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on S3 failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on S3 failed! Check the workflow run for details.

@github-actions

Copy link
Copy Markdown
Contributor

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Appian on NVME failed! Check the workflow run for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants