do not merge: onpair dfa by joseph-isaacs · Pull Request #8361 · vortex-data/vortex

joseph-isaacs · 2026-06-11T14:19:00Z

Summary

Closes: #000

Testing

Evaluate `prefix%` and `%needle%` LIKE patterns directly on OnPair compressed code streams, mirroring the FSST DFA pushdown. Each u16 code is lifted to a byte-level DFA transition (KMP for contains, linear for prefix) by feeding its dictionary token's bytes through the byte table; scanning a row's codes is then one table lookup per code and is exactly equivalent to byte-level matching over the decompressed row. OnPair has no escape code (the trainer always emits all 256 single-byte tokens), so the DFA is strictly simpler than FSST's: no escape sentinel and no escape table. Unsupported pattern shapes (`_`, suffix, ILIKE, needles beyond the u8 state space) return None and fall back to decompression. Wires `LikeExecuteAdaptor(OnPair)` into the parent kernel set. Adds unit tests plus a randomised cross-check against ground-truth starts_with / contains over 600 rows and 14 needles. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Add a divan microbenchmark comparing the compressed-domain LIKE pushdown against the decompress-and-match fallback on a 200k-row OnPair-encoded URL column. On this corpus the pushdown is ~1.9-2.2x faster for prefix and ~2.4-3.3x for contains. Two benchmark-enablement knobs: - `VORTEX_ONPAIR_LIKE_PUSHDOWN=0` forces the OnPair LikeKernel to decline (fall back to decompression), so the same binary can A/B the pushdown end-to-end without a rebuild. Read once. - `CLICKBENCH_PARTITIONS=N` caps how many ClickBench shards are fetched and queried, for local/iterative runs (the full suite still defaults to 100). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Select the DFA variant once in `OnPairMatcher::scan_to_bitbuf` instead of re-matching the matcher enum per row through a closure, mark the concrete `FlatContainsDfa`/`FlatPrefixDfa::matches` `#[inline]`, and walk row offsets with a running cursor. This lets the row scan monomorphise and inline the DFA step. Controlled microbench (same machine, back-to-back): contains pushdown ~1.16-1.26x faster (e.g. %bonprix% 1.84ms -> 1.46ms), prefix marginally faster. Also add an instrumented characterization test proving where the pushdown actually fires through the execution engine: bare OnPair and Dict(OnPair) both route the predicate to the kernel, but Dict(Shared(OnPair)) -- the shape a dict-encoded column takes when read back from a multi-chunk file -- does not, because `Shared` has no parent-reduce forwarding and canonicalizes (decompresses) instead. This is why the compressed-domain LIKE pushdown does not move end-to-end ClickBench/TPC-H numbers, and it affects FSST identically. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

A dict-encoded string column reads back as `Dict(codes, Shared(values))`. `Shared` (which dedups the decoded dictionary across row splits) has no parent-reduce forwarding, so a predicate pushed to the values -- `like(Shared(onpair))` -- canonicalizes (decompresses) the source instead of reaching the OnPair/FSST LIKE kernel. Because the filter path's `values_array_uncanonical` reused the projection's `Shared`-wrapped cache, any query that both projects and filters the same column (e.g. ClickBench Q22's `MIN(URL)` + `WHERE URL LIKE`) silently lost the pushdown. Give the predicate path its own bare (non-`Shared`) values cache, built on the same underlying read as the `Shared` projection cache (values are read once). Projection keeps `Shared` for cross-split decode reuse; predicates get bare values so the optimizer can push them into the values encoding. Verified end-to-end on a ClickBench shard (OnPair-encoded `URL`): - Q22-shape (filter + project URL): kernel firings 0 -> 44, query faster. - count(*) filter: still 44 firings, result unchanged. - Q34 (GROUP BY URL, pure decode): unchanged (no decode-cache regression). Also retarget the OnPair characterization test's comment at this layout fix (the array-level `Shared`-blocks-pushdown behavior it pins is what motivates applying predicates to bare values). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

The per-call DFA table was the dominant cost of the LIKE pushdown on dict-encoded columns (~17% of ClickBench Q21 in a samply profile): it built an `n_states x n_codes` transition for every one of the (up to 4096) dictionary tokens, even though the needle/prefix can only interact with the tokens that contain one of its bytes. A token whose bytes are all absent from the pattern drives the byte table to the same reset state from every *live* state (a non-needle byte falls back to 0 via KMP from any non-accept state; a non-prefix byte fails), and the accept/fail rows are never read because the scan returns the instant it reaches them. So such a token's whole column is just the skip value. Pre-fill the table with the skip value and only compute columns for codes containing a pattern byte; for those, read the token once while advancing all `n_states` start states in lockstep (a per-byte gather). Build-heavy microbench (build + 4k-row scan): ~1.3-1.6x faster, more for rare-byte needles (most tokens skipped), less for common-byte needles like `%google%` on URLs. Randomized ground-truth fuzz test still passes. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

codspeed-hq · 2026-06-11T14:26:10Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1530 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	215.3 ns	244.4 ns	-11.93%
⚡	WallTime	`cuda/bitpacked_u8/unpack/3bw[100M]`	352.4 µs	299.7 µs	+17.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/relaxed-goodall-e3s5pr (5257888) with develop (0dd6db7)}

github-actions · 2026-06-12T10:28:48Z

Polar Signals Profiling Results

Latest Run

Status	Commit	Job	Attempt	Link
🟢 Done	`5257888`		1	Explore Profiling Data

Powered by Polar Signals Cloud

github-actions · 2026-06-12T10:30:57Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark PolarSignals Profiling failed! Check the workflow run for details.

github-actions · 2026-06-12T10:31:46Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb NVMe failed! Check the workflow run for details.

github-actions · 2026-06-12T10:32:11Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on NVME failed! Check the workflow run for details.

github-actions · 2026-06-12T10:34:26Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-DS SF=1 on NVME failed! Check the workflow run for details.

github-actions · 2026-06-12T10:38:17Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb S3 failed! Check the workflow run for details.

github-actions · 2026-06-12T10:38:22Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Statistical and Population Genetics failed! Check the workflow run for details.

github-actions · 2026-06-12T10:41:03Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on NVME failed! Check the workflow run for details.

github-actions · 2026-06-12T10:41:58Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Clickbench on NVME failed! Check the workflow run for details.

github-actions · 2026-06-12T10:44:34Z

BENCHMARK FAILED

Benchmark Random Access failed! Check the workflow run for details.

github-actions · 2026-06-12T10:45:15Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on S3 failed! Check the workflow run for details.

github-actions · 2026-06-12T10:47:19Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on S3 failed! Check the workflow run for details.

github-actions · 2026-06-12T10:51:09Z

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Appian on NVME failed! Check the workflow run for details.

claude added 5 commits June 9, 2026 14:25

joseph-isaacs changed the title ~~Claude/relaxed goodall e3s5pr~~ do not merge: onpair dfa Jun 11, 2026

joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026

github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026

Conversation

joseph-isaacs commented Jun 11, 2026

Summary

Testing

Uh oh!

codspeed-hq Bot commented Jun 11, 2026

Merging this PR will not alter performance

Performance Changes

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Polar Signals Profiling Results

Latest Run

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

BENCHMARK FAILED

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

github-actions Bot commented Jun 12, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 12, 2026 •

edited

Loading