Skip to content

feat: route Unsupported through codegen dispatch for opt-in serdes#4728

Open
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:unsupported-codegen-dispatch
Open

feat: route Unsupported through codegen dispatch for opt-in serdes#4728
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:unsupported-codegen-dispatch

Conversation

@andygrove

Copy link
Copy Markdown
Member

Summary

  • CodegenDispatchFallback already keeps Incompatible cases inside the Comet pipeline by routing them through the JVM codegen dispatcher (Spark's own doGenCode invoked inside the kernel). This PR extends the same routing to Unsupported cases on those serdes: when getSupportLevel returns Unsupported and the serde mixes in CodegenDispatchFallback, run Spark's generated code via the dispatcher before falling back to Spark.
  • Affects Concat (non-string children), SortArray (nested arrays with Struct or Null children), ArrayIntersect (collated strings), TruncDate and TruncTimestamp (formats outside the native set). All five conditions are things Spark itself supports, so the projection now stays in Comet's pipeline instead of bouncing out to Spark.
  • Refreshes the expression compatibility docs: drops the "faster native" wording (no measured speed claim), clarifies that the default path for NativeOptInAvailable serdes runs in the JVM via Spark codegen, and renders Unsupported reasons differently for CodegenDispatchFallback serdes (always handled via JVM dispatch) vs everything else (still falls back to Spark).

Behavior change

For the five CodegenDispatchFallback serdes listed above, inputs that previously caused the entire projection to fall back to Spark now stay inside Comet via dispatch. Dispatch already returns None cleanly when its global flag (spark.comet.exec.scalaUDF.codegen.enabled) is disabled or CometBatchKernelCodegen.canHandle refuses the tree, so the safety net to Spark fallback is preserved.

No [COMET-INFO] plan hint is emitted in the Unsupported arm — unlike Incompatible, there is no native opt-in for the user to flip.

Test plan

  • ./mvnw test -pl spark -Pspark-3.5 -Dsuites="org.apache.comet.CometExpressionSuite" (concat, sort_array, array_intersect, trunc coverage)
  • Spot-check extended explain on a Concat with non-string children — confirm no [COMET: …fall back…] and the projection remains a CometProject
  • Regenerate docs and review string.md, array.md, datetime.md for the new "no native implementation, runs in the JVM" wording

CodegenDispatchFallback already routes Incompatible cases through the
JVM codegen dispatcher so the projection stays inside the Comet pipeline
instead of falling back to Spark. Apply the same routing to Unsupported
cases: when getSupportLevel returns Unsupported and the serde mixes in
CodegenDispatchFallback, run Spark's own doGenCode via the dispatcher
before resorting to Spark fallback. Affects Concat (non-string children),
SortArray (nested Struct/Null children), ArrayIntersect (collated
strings), TruncDate and TruncTimestamp (formats outside the native set).

Also refresh the expression compatibility docs: drop "faster native"
wording (no measured claim), clarify that the default path runs in the
JVM via Spark codegen, and render Unsupported reasons differently for
CodegenDispatchFallback serdes (always JVM dispatch) vs everything else
(always Spark fallback).
The Unsupported and Incompatible arms both pattern-match on
CodegenDispatchFallback and call emitJvmCodegenDispatch. Lift that
pattern into a single helper that returns the matched handler alongside
the dispatched expression, so the Incompatible arm can reach
nativeOptInConfigKeyOverride for its [COMET-INFO] hint without
re-matching the same value.
@andygrove andygrove requested a review from mbutrovich June 25, 2026 18:04
@andygrove andygrove marked this pull request as ready for review June 25, 2026 18:04
@andygrove andygrove added this to the 1.0.0 milestone Jun 25, 2026
@andygrove andygrove self-assigned this Jun 25, 2026
…h codegen dispatch

Concat, SortArray, TruncDate, and TruncTimestamp now route their previously Unsupported cases through the JVM codegen dispatcher and stay native instead of falling back to Spark. Update the test expectations to assert native execution and a matching answer rather than a fallback reason.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant