fix: fall back for decimal SUM/AVG over sliding window frames (window audit)#4732
Merged
Merged
Conversation
Sliding window frames (lower bound other than UNBOUNDED PRECEDING) route decimal SUM/AVG to DataFusion's built-in accumulators, which wrap on overflow instead of returning Spark's NULL. The wrapped value can fall outside the declared decimal precision and even break Spark result decoding. Overflow can't be detected at plan time, so fall back to Spark for the whole sliding decimal SUM/AVG case, mirroring the existing RANGE-frame fallbacks. Ever-expanding frames keep using Comet's overflow-aware SumDecimal/AvgDecimal UDAFs and stay native. Add regression tests in window_functions.sql (ever-expanding overflow stays native and returns NULL, sliding decimal SUM falls back) and record the findings in the window_funcs expression audit notes. This audit also surfaced that decimal AVG over a window always falls back to Spark on Spark 4.x because CometWindowExec does not recognize the Cast(Divide(...)) average shape, leaving the AvgDecimal native window branch dead there. That is a coverage gap (results stay correct via fallback) and is tracked separately in apache#4731. Closes apache#4729
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4729
Rationale for this change
This PR is the result of auditing the native window function support added in #4209 (covering ranking, value-shift, and standard aggregate window functions). The audit walked the Scala serde (
CometWindowExec), the native planner (create_window_expr/process_agg_func), and the DataFusion 54 window/aggregate accumulators, and verified behavior empirically through the differential SQL-file test harness.The audit found one correctness divergence.
SUM(<decimal>)over a sliding window frame (a frame whose lower bound is notUNBOUNDED PRECEDING) routes to DataFusion's built-insum, whose accumulator wraps on overflow (add_wrapping) instead of returning Spark's NULL. On overflow the native result is a wrapped value that can fall outside the declared decimal precision, which even breaks Spark's result decoding (EXPRESSION_DECODING_FAILED/NUMERIC_VALUE_OUT_OF_RANGE). The divergence is decimal-specific: bigint slidingSUMoverflow matches Spark (both wrap), and ever-expanding decimalSUMoverflow already matches Spark (Comet'sSumDecimalUDAF returns NULL).What changes are included in this PR?
SUM/AVGover a sliding window frame inCometWindowExec. Overflow cannot be detected at plan time, so the whole sliding decimal case falls back, mirroring the existing RANGE-frame DATE/DECIMAL fallbacks. Ever-expanding frames keep using Comet's overflow-awareSumDecimal/AvgDecimalUDAFs and continue to run natively.windows/window_functions.sql(Section 8): ever-expanding decimalSUMoverflow stays native and returns NULL, and sliding decimalSUMfalls back.window_funcsexpression-audit notes.The audit also surfaced a coverage gap that is not a correctness divergence and is tracked separately in #4731:
AVG(<decimal>)over a window always falls back to Spark on Spark 4.x becauseCometWindowExecdoes not recognize theCast(Divide(...))average shape, leaving theAvgDecimalnative window branch dead there. Results stay correct via fallback. The fallback guard added here also coversAVGso the overflow class of bug cannot reappear once that shape is recognized.This work was scaffolded by the
audit-comet-expressionproject skill.How are these changes tested?
windows/window_functions.sqlrun throughCometSqlFileTestSuite, which compares Comet against Spark.CometWindowExecSuite(49 tests) passes unchanged.