feat: route Unsupported through codegen dispatch for opt-in serdes#4728
Open
andygrove wants to merge 3 commits into
Open
feat: route Unsupported through codegen dispatch for opt-in serdes#4728andygrove wants to merge 3 commits into
andygrove wants to merge 3 commits into
Conversation
CodegenDispatchFallback already routes Incompatible cases through the JVM codegen dispatcher so the projection stays inside the Comet pipeline instead of falling back to Spark. Apply the same routing to Unsupported cases: when getSupportLevel returns Unsupported and the serde mixes in CodegenDispatchFallback, run Spark's own doGenCode via the dispatcher before resorting to Spark fallback. Affects Concat (non-string children), SortArray (nested Struct/Null children), ArrayIntersect (collated strings), TruncDate and TruncTimestamp (formats outside the native set). Also refresh the expression compatibility docs: drop "faster native" wording (no measured claim), clarify that the default path runs in the JVM via Spark codegen, and render Unsupported reasons differently for CodegenDispatchFallback serdes (always JVM dispatch) vs everything else (always Spark fallback).
The Unsupported and Incompatible arms both pattern-match on CodegenDispatchFallback and call emitJvmCodegenDispatch. Lift that pattern into a single helper that returns the matched handler alongside the dispatched expression, so the Incompatible arm can reach nativeOptInConfigKeyOverride for its [COMET-INFO] hint without re-matching the same value.
…h codegen dispatch Concat, SortArray, TruncDate, and TruncTimestamp now route their previously Unsupported cases through the JVM codegen dispatcher and stay native instead of falling back to Spark. Update the test expectations to assert native execution and a matching answer rather than a fallback reason.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CodegenDispatchFallbackalready keepsIncompatiblecases inside the Comet pipeline by routing them through the JVM codegen dispatcher (Spark's owndoGenCodeinvoked inside the kernel). This PR extends the same routing toUnsupportedcases on those serdes: whengetSupportLevelreturnsUnsupportedand the serde mixes inCodegenDispatchFallback, run Spark's generated code via the dispatcher before falling back to Spark.Concat(non-string children),SortArray(nested arrays with Struct or Null children),ArrayIntersect(collated strings),TruncDateandTruncTimestamp(formats outside the native set). All five conditions are things Spark itself supports, so the projection now stays in Comet's pipeline instead of bouncing out to Spark.NativeOptInAvailableserdes runs in the JVM via Spark codegen, and rendersUnsupportedreasons differently forCodegenDispatchFallbackserdes (always handled via JVM dispatch) vs everything else (still falls back to Spark).Behavior change
For the five
CodegenDispatchFallbackserdes listed above, inputs that previously caused the entire projection to fall back to Spark now stay inside Comet via dispatch. Dispatch already returnsNonecleanly when its global flag (spark.comet.exec.scalaUDF.codegen.enabled) is disabled orCometBatchKernelCodegen.canHandlerefuses the tree, so the safety net to Spark fallback is preserved.No
[COMET-INFO]plan hint is emitted in theUnsupportedarm — unlikeIncompatible, there is no native opt-in for the user to flip.Test plan
./mvnw test -pl spark -Pspark-3.5 -Dsuites="org.apache.comet.CometExpressionSuite"(concat, sort_array, array_intersect, trunc coverage)Concatwith non-string children — confirm no[COMET: …fall back…]and the projection remains aCometProjectstring.md,array.md,datetime.mdfor the new "no native implementation, runs in the JVM" wording