Support out of order grouped aggregate accumulation#8379
Conversation
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Merging this PR will degrade performance by 18.04%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | sum_i32_nullable_all_valid |
73.1 µs | 105 µs | -30.39% |
| ❌ | Simulation | compare[48] |
213 µs | 300.6 µs | -29.15% |
| ❌ | Simulation | compare[50] |
227.7 µs | 319.2 µs | -28.66% |
| ❌ | Simulation | compare[49] |
228.2 µs | 317.8 µs | -28.21% |
| ❌ | Simulation | compare[44] |
207.5 µs | 287.9 µs | -27.94% |
| ❌ | Simulation | compare[46] |
218.5 µs | 302.6 µs | -27.78% |
| ❌ | Simulation | compare[47] |
223.5 µs | 309.3 µs | -27.74% |
| ❌ | Simulation | compare[40] |
190.7 µs | 263.7 µs | -27.71% |
| ❌ | Simulation | compare[44] |
212.2 µs | 292.4 µs | -27.43% |
| ❌ | Simulation | compare[43] |
209.2 µs | 287.8 µs | -27.29% |
| ❌ | Simulation | compare[45] |
218.9 µs | 301.1 µs | -27.29% |
| ❌ | Simulation | compare[42] |
204.6 µs | 281.3 µs | -27.28% |
| ❌ | Simulation | compare[40] |
195.6 µs | 268.4 µs | -27.11% |
| ❌ | Simulation | compare[41] |
204.5 µs | 279.5 µs | -26.82% |
| ❌ | Simulation | compare[43] |
214.2 µs | 292.6 µs | -26.8% |
| ❌ | Simulation | compare[42] |
209.4 µs | 286 µs | -26.77% |
| ❌ | Simulation | compare[41] |
209.3 µs | 284.1 µs | -26.31% |
| ❌ | Simulation | compare[39] |
199.9 µs | 271 µs | -26.25% |
| ❌ | Simulation | compare[38] |
195.5 µs | 264.8 µs | -26.16% |
| ❌ | Simulation | sum_f64_all_valid |
38.1 µs | 51.4 µs | -25.91% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/grouped-aggregate (2ef64b2) with develop (d0013ff)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
I, Nicholas Gates <nick@nickgates.com>, hereby add my Signed-off-by to this commit: 6bdcd32 I, Nicholas Gates <nick@nickgates.com>, hereby add my Signed-off-by to this commit: 9bd157f I, Nicholas Gates <nick@nickgates.com>, hereby add my Signed-off-by to this commit: 50701b2 Signed-off-by: Nicholas Gates <nick@nickgates.com>
|
We wait that vortex was a linear scan engine. This breaks it correct? |
onursatici
left a comment
There was a problem hiding this comment.
I prefer this API, not only that it supports out of order groups but also it can keep the memory footprint lower by not having us to materialise each group before calling aggregate.
I think you are conflicting with #8314
Sum and count kernels you added make sense to me but we went with having a encoding agnostic kernel support on the registry on the merged PR instead of having the kernel in the aggregate function's vtable
No, the idea is that I can take an array, compute the group indices (i.e. using a scalar function), then update the aggregate state per group. This still doesn't require a shuffle / sort. But it does mean aggregate state may grow large. So in the future this is where we would want the ability to spill partials / reconstruct later. |
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
| num_groups: usize, | ||
| } | ||
|
|
||
| fn dense_grouped_input(values: ArrayRef, group_sizes: &[usize]) -> DenseGroupedInput { |
There was a problem hiding this comment.
nit - this is basically the constructor for DenseGroupedInput
Signed-off-by: Nicholas Gates <nick@nickgates.com>
| fn grouped_count( | ||
| elements: &ArrayRef, | ||
| group_ranges: &GroupRanges, | ||
| pub(super) fn try_accumulate_grouped( |
There was a problem hiding this comment.
nit - this seems short enough to inline into the vtable
Our original API for grouped aggregate functions only supported groups that we pre-sorted into complete lists. That means large groups would need to be globally sorted before we could start accumulating their internal state. This is not how most engines define their aggregate APIs.
This PR allows us to support grouped accumulation in the more traditional way by using group IDs.