Skip to content

aggregate_fn: Return NaN instead of null for mean of all-null input#8365

Merged
robert3005 merged 1 commit into
developfrom
mitko/aggregate/mean-all-null-nan
Jun 11, 2026
Merged

aggregate_fn: Return NaN instead of null for mean of all-null input#8365
robert3005 merged 1 commit into
developfrom
mitko/aggregate/mean-all-null-nan

Conversation

@dimitarvdimitrov

Copy link
Copy Markdown
Contributor

Mean::finalize_scalar returned null when the count was zero, while the array finalize path computes sum/count = 0/0 = NaN for the same input. A mean over an all-null group therefore gave different results depending on which accumulator we're using.

fn finalize(&self, sum: ArrayRef, count: ArrayRef) -> VortexResult<ArrayRef> {
let target = match sum.dtype() {
DType::Decimal(..) => sum.dtype().with_nullability(Nullability::Nullable),
_ => DType::Primitive(PType::F64, Nullability::Nullable),
};
let sum_cast = sum.cast(target.clone())?;
let count_cast = count.cast(target)?;
sum_cast.binary(count_cast, Operator::Div)
}

Since nulls are skipped during accumulation (as in standard SQL aggregation), an all-null input is an empty mean. Both paths now let the division produce NaN.

Note this only matters for the count = 0 case: sum overflow still returns null.

Mean::finalize_scalar returned null when count was zero, while the array
finalize path computes sum/count = 0/0 = NaN for the same input. The
result of a mean over an all-null group therefore depended on which
reduce path it took. Nulls are skipped during accumulation, so an
all-null input is an empty mean: let the division produce NaN in both
paths.

Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
@dimitarvdimitrov dimitarvdimitrov requested review from a team and blaginin June 11, 2026 15:20
@dimitarvdimitrov dimitarvdimitrov changed the title Return NaN instead of null for mean of all-null input aggregate_fn: Return NaN instead of null for mean of all-null input Jun 11, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 32.96%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
✅ 1529 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 35.4 µs 20.5 µs +73.06%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 213.2 µs 177.1 µs +20.42%
Simulation chunked_varbinview_canonical_into[(100, 100)] 308.5 µs 273.5 µs +12.8%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing mitko/aggregate/mean-all-null-nan (90d7433) with develop (eda4dd0)

Open in CodSpeed

@robert3005 robert3005 merged commit 05871c3 into develop Jun 11, 2026
73 of 74 checks passed
@robert3005 robert3005 deleted the mitko/aggregate/mean-all-null-nan branch June 11, 2026 17:34
@joseph-isaacs

Copy link
Copy Markdown
Contributor

This looks incorrect to me. I think this should return Null

@robert3005

Copy link
Copy Markdown
Contributor

I thought that NaN made more semantic sense but you can argue either way. Just have to document what is mean of an empty array

@robert3005

Copy link
Copy Markdown
Contributor

Basically I think NaN as a result composes from partials more naturally without any additional checks

@dimitarvdimitrov

Copy link
Copy Markdown
Contributor Author

if we combine NaNs, then the associativity of combining partials breaks. mean([a, b, c]) != combine_partial(mean_partial([a, b]), mean_partial([c])) because NaN always turns anything else into a NaN, but we can ignore nulls.

both duckdb and datafusion return nulls instead of nans for all-null input.

opened #8389 to revert this commit and fix finalize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants