perf(executor): per-ExecutionCtx ArrayKernels snapshot by lukekim · Pull Request #8401 · vortex-data/vortex

lukekim · 2026-06-12T21:21:57Z

Summary

Resolves ArrayKernels once at ExecutionCtx construction into a crate-private KernelSnapshot (a single ArcSwapMap::load_full of the execute-parent kernel map), instead of cloning the session and probing the sharded session-variable DashMap on every array node in the execute_until / single-step execution paths.

Why

Removes a per-array-node session clone + DashMap shard RwLock probe from the hot execution loop.
Stops holding the session-variable read guard across kernel invocation (previously a plugin kernel touching the session registry could contend/deadlock on the same shard).

Semantics

The registry is session-scoped and mutable via its public register_* methods. An ExecutionCtx sees a point-in-time snapshot taken at construction; later registrations are picked up by the next context (contexts are created per evaluation). Kernel lookup order is unchanged: registered plugin kernels are tried before static execute_parent kernels, with the same (parent, child) hashing.

Adds a pub(crate) ArcSwapMap::load_full accessor (snapshot-that-outlives-the-call, complementing read). KernelSnapshot and ArrayKernels::snapshot() are pub(crate) — no new public API.

Testing

cargo nextest run -p vortex-array — 2893 passed (includes struct_cast_execute_parent_uses_session_plugin, covering the register → snapshot → execute path).
cargo clippy --all-targets -p vortex-array — clean on the spiceai-53 variant of this change; develop port re-verified via the full test build.

Resolve ArrayKernels once at ExecutionCtx construction into a KernelSnapshot (an ArcSwapMap::load_full of the execute-parent kernel map) instead of cloning the session and probing the sharded session-variable DashMap on every array node in the execute_until / single-step paths. This also stops holding the session-variable read guard across kernel invocation. The registry is session-scoped and mutable via its public register_* methods: an ExecutionCtx sees a point-in-time snapshot taken at construction, and later registrations are picked up by the next context (contexts are created per evaluation). Adds a pub(crate) ArcSwapMap::load_full accessor; the KernelSnapshot type and ArrayKernels::snapshot() are pub(crate), so no new public API is added. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

…snapshot-develop

codspeed-hq · 2026-06-12T21:32:54Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 52 improved benchmarks
❌ 62 regressed benchmarks
✅ 1423 untouched benchmarks
⏩ 10 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	20.7 µs	36.3 µs	-43.08%
❌	Simulation	`compare[48]`	213 µs	300.4 µs	-29.11%
❌	Simulation	`compare[50]`	227.7 µs	318.9 µs	-28.58%
❌	Simulation	`compare[49]`	228.2 µs	317.5 µs	-28.13%
❌	Simulation	`compare[46]`	218.5 µs	302.2 µs	-27.69%
❌	Simulation	`compare[47]`	223.5 µs	309.1 µs	-27.68%
❌	Simulation	`compare[44]`	212.2 µs	292.1 µs	-27.37%
❌	Simulation	`compare[45]`	218.9 µs	300.7 µs	-27.21%
❌	Simulation	`compare[40]`	195.6 µs	267.3 µs	-26.82%
❌	Simulation	`compare[43]`	214.2 µs	292.3 µs	-26.71%
❌	Simulation	`compare[42]`	209.4 µs	285.6 µs	-26.68%
❌	Simulation	`compare[41]`	209.3 µs	283.8 µs	-26.22%
❌	Simulation	`compare[39]`	204.7 µs	274.5 µs	-25.43%
❌	Simulation	`compare[38]`	200.3 µs	268.3 µs	-25.33%
❌	Simulation	`compare[32]`	173.4 µs	231 µs	-24.96%
❌	Simulation	`compare[36]`	194 µs	258.2 µs	-24.86%
❌	Simulation	`compare[37]`	200.1 µs	266.2 µs	-24.83%
❌	Simulation	`compare[35]`	195.4 µs	257.7 µs	-24.19%
❌	Simulation	`compare[34]`	191.1 µs	251.6 µs	-24.03%
❌	Simulation	`compare[33]`	190.6 µs	250.1 µs	-23.78%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing spiceai:lukim/exec-kernel-snapshot-develop (2820b17) with develop (d0013ff)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

lukekim requested a review from a team June 12, 2026 21:21

gatesn added the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026

Merge remote-tracking branch 'origin/develop' into lukim/exec-kernel-…

2820b17

…snapshot-develop

lukekim mentioned this pull request Jun 12, 2026

Merge upstream Vortex 0.75.0 into spiceai-54 (DataFusion 53 → 54) spiceai/vortex#65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(executor): per-ExecutionCtx ArrayKernels snapshot#8401

perf(executor): per-ExecutionCtx ArrayKernels snapshot#8401
lukekim wants to merge 2 commits into
vortex-data:developfrom
spiceai:lukim/exec-kernel-snapshot-develop

lukekim commented Jun 12, 2026

Uh oh!

codspeed-hq Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukekim commented Jun 12, 2026

Summary

Why

Semantics

Testing

Uh oh!

codspeed-hq Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented Jun 12, 2026 •

edited

Loading