do not merge: onpair dfa#8361
Conversation
Evaluate `prefix%` and `%needle%` LIKE patterns directly on OnPair compressed code streams, mirroring the FSST DFA pushdown. Each u16 code is lifted to a byte-level DFA transition (KMP for contains, linear for prefix) by feeding its dictionary token's bytes through the byte table; scanning a row's codes is then one table lookup per code and is exactly equivalent to byte-level matching over the decompressed row. OnPair has no escape code (the trainer always emits all 256 single-byte tokens), so the DFA is strictly simpler than FSST's: no escape sentinel and no escape table. Unsupported pattern shapes (`_`, suffix, ILIKE, needles beyond the u8 state space) return None and fall back to decompression. Wires `LikeExecuteAdaptor(OnPair)` into the parent kernel set. Adds unit tests plus a randomised cross-check against ground-truth starts_with / contains over 600 rows and 14 needles. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Add a divan microbenchmark comparing the compressed-domain LIKE pushdown against the decompress-and-match fallback on a 200k-row OnPair-encoded URL column. On this corpus the pushdown is ~1.9-2.2x faster for prefix and ~2.4-3.3x for contains. Two benchmark-enablement knobs: - `VORTEX_ONPAIR_LIKE_PUSHDOWN=0` forces the OnPair LikeKernel to decline (fall back to decompression), so the same binary can A/B the pushdown end-to-end without a rebuild. Read once. - `CLICKBENCH_PARTITIONS=N` caps how many ClickBench shards are fetched and queried, for local/iterative runs (the full suite still defaults to 100). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Select the DFA variant once in `OnPairMatcher::scan_to_bitbuf` instead of re-matching the matcher enum per row through a closure, mark the concrete `FlatContainsDfa`/`FlatPrefixDfa::matches` `#[inline]`, and walk row offsets with a running cursor. This lets the row scan monomorphise and inline the DFA step. Controlled microbench (same machine, back-to-back): contains pushdown ~1.16-1.26x faster (e.g. %bonprix% 1.84ms -> 1.46ms), prefix marginally faster. Also add an instrumented characterization test proving where the pushdown actually fires through the execution engine: bare OnPair and Dict(OnPair) both route the predicate to the kernel, but Dict(Shared(OnPair)) -- the shape a dict-encoded column takes when read back from a multi-chunk file -- does not, because `Shared` has no parent-reduce forwarding and canonicalizes (decompresses) instead. This is why the compressed-domain LIKE pushdown does not move end-to-end ClickBench/TPC-H numbers, and it affects FSST identically. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
A dict-encoded string column reads back as `Dict(codes, Shared(values))`. `Shared` (which dedups the decoded dictionary across row splits) has no parent-reduce forwarding, so a predicate pushed to the values -- `like(Shared(onpair))` -- canonicalizes (decompresses) the source instead of reaching the OnPair/FSST LIKE kernel. Because the filter path's `values_array_uncanonical` reused the projection's `Shared`-wrapped cache, any query that both projects and filters the same column (e.g. ClickBench Q22's `MIN(URL)` + `WHERE URL LIKE`) silently lost the pushdown. Give the predicate path its own bare (non-`Shared`) values cache, built on the same underlying read as the `Shared` projection cache (values are read once). Projection keeps `Shared` for cross-split decode reuse; predicates get bare values so the optimizer can push them into the values encoding. Verified end-to-end on a ClickBench shard (OnPair-encoded `URL`): - Q22-shape (filter + project URL): kernel firings 0 -> 44, query faster. - count(*) filter: still 44 firings, result unchanged. - Q34 (GROUP BY URL, pure decode): unchanged (no decode-cache regression). Also retarget the OnPair characterization test's comment at this layout fix (the array-level `Shared`-blocks-pushdown behavior it pins is what motivates applying predicates to bare values). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
The per-call DFA table was the dominant cost of the LIKE pushdown on dict-encoded columns (~17% of ClickBench Q21 in a samply profile): it built an `n_states x n_codes` transition for every one of the (up to 4096) dictionary tokens, even though the needle/prefix can only interact with the tokens that contain one of its bytes. A token whose bytes are all absent from the pattern drives the byte table to the same reset state from every *live* state (a non-needle byte falls back to 0 via KMP from any non-accept state; a non-prefix byte fails), and the accept/fail rows are never read because the scan returns the instant it reaches them. So such a token's whole column is just the skip value. Pre-fill the table with the skip value and only compute columns for codes containing a pattern byte; for those, read the token once while advancing all `n_states` start states in lockstep (a per-byte gather). Build-heavy microbench (build + 4k-row scan): ~1.3-1.6x faster, more for rare-byte needles (most tokens skipped), less for common-byte needles like `%google%` on URLs. Randomized ground-truth fuzz test still passes. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 244.4 ns | -11.93% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
352.4 µs | 299.7 µs | +17.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/relaxed-goodall-e3s5pr (5257888) with develop (0dd6db7)
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
BENCHMARK FAILEDBenchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Summary
Closes: #000
Testing