Skip to content

fix: fall back for decimal SUM/AVG over sliding window frames (window audit)#4732

Merged
andygrove merged 1 commit into
apache:mainfrom
andygrove:audit-window-functions
Jun 25, 2026
Merged

fix: fall back for decimal SUM/AVG over sliding window frames (window audit)#4732
andygrove merged 1 commit into
apache:mainfrom
andygrove:audit-window-functions

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4729

Rationale for this change

This PR is the result of auditing the native window function support added in #4209 (covering ranking, value-shift, and standard aggregate window functions). The audit walked the Scala serde (CometWindowExec), the native planner (create_window_expr / process_agg_func), and the DataFusion 54 window/aggregate accumulators, and verified behavior empirically through the differential SQL-file test harness.

The audit found one correctness divergence. SUM(<decimal>) over a sliding window frame (a frame whose lower bound is not UNBOUNDED PRECEDING) routes to DataFusion's built-in sum, whose accumulator wraps on overflow (add_wrapping) instead of returning Spark's NULL. On overflow the native result is a wrapped value that can fall outside the declared decimal precision, which even breaks Spark's result decoding (EXPRESSION_DECODING_FAILED / NUMERIC_VALUE_OUT_OF_RANGE). The divergence is decimal-specific: bigint sliding SUM overflow matches Spark (both wrap), and ever-expanding decimal SUM overflow already matches Spark (Comet's SumDecimal UDAF returns NULL).

What changes are included in this PR?

  • Fall back to Spark for decimal SUM / AVG over a sliding window frame in CometWindowExec. Overflow cannot be detected at plan time, so the whole sliding decimal case falls back, mirroring the existing RANGE-frame DATE/DECIMAL fallbacks. Ever-expanding frames keep using Comet's overflow-aware SumDecimal / AvgDecimal UDAFs and continue to run natively.
  • Add regression tests in windows/window_functions.sql (Section 8): ever-expanding decimal SUM overflow stays native and returns NULL, and sliding decimal SUM falls back.
  • Record the findings in the window_funcs expression-audit notes.

The audit also surfaced a coverage gap that is not a correctness divergence and is tracked separately in #4731: AVG(<decimal>) over a window always falls back to Spark on Spark 4.x because CometWindowExec does not recognize the Cast(Divide(...)) average shape, leaving the AvgDecimal native window branch dead there. Results stay correct via fallback. The fallback guard added here also covers AVG so the overflow class of bug cannot reappear once that shape is recognized.

This work was scaffolded by the audit-comet-expression project skill.

How are these changes tested?

  • New and existing cases in windows/window_functions.sql run through CometSqlFileTestSuite, which compares Comet against Spark.
  • CometWindowExecSuite (49 tests) passes unchanged.

Sliding window frames (lower bound other than UNBOUNDED PRECEDING) route
decimal SUM/AVG to DataFusion's built-in accumulators, which wrap on
overflow instead of returning Spark's NULL. The wrapped value can fall
outside the declared decimal precision and even break Spark result
decoding. Overflow can't be detected at plan time, so fall back to Spark
for the whole sliding decimal SUM/AVG case, mirroring the existing
RANGE-frame fallbacks. Ever-expanding frames keep using Comet's
overflow-aware SumDecimal/AvgDecimal UDAFs and stay native.

Add regression tests in window_functions.sql (ever-expanding overflow
stays native and returns NULL, sliding decimal SUM falls back) and record
the findings in the window_funcs expression audit notes.

This audit also surfaced that decimal AVG over a window always falls back
to Spark on Spark 4.x because CometWindowExec does not recognize the
Cast(Divide(...)) average shape, leaving the AvgDecimal native window
branch dead there. That is a coverage gap (results stay correct via
fallback) and is tracked separately in apache#4731.

Closes apache#4729
@andygrove andygrove requested a review from comphead June 25, 2026 18:41

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@andygrove andygrove merged commit be9a965 into apache:main Jun 25, 2026
70 checks passed
@andygrove andygrove deleted the audit-window-functions branch June 25, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] SUM(decimal) over a sliding window frame returns wrapped out-of-range value on overflow instead of NULL

2 participants