build(kernel): add optional [kernel] extra for use_kernel=True#839
Merged
Conversation
databricks-sql-kernel is now published to PyPI, so the kernel backend can ship as an optional dependency instead of a local-dev-only build. - pyproject: declare databricks-sql-kernel as an optional dependency gated to python>=3.10 (the wheel is cp310-abi3, Requires-Python >=3.10), and add the `[kernel]` extra. The extra also lists pyarrow: the kernel result path (backend/kernel/result_set.py) imports it unconditionally to wrap the Arrow batches the kernel returns. pyarrow is already pulled transitively via the kernel wheel's pyarrow>=23.0.1,<24, but naming it makes the connector-side requirement explicit and lets pip co-resolve both constraints at install time. - backend/kernel/_errors.py: update the use_kernel=True ImportError to point at `pip install "databricks-sql-connector[kernel]"` and note the python>=3.10 requirement (was the obsolete "not yet published, build locally" hint). - README: document the [kernel] extra, use_kernel=True usage, and the python>=3.10 / pyarrow notes. On python<3.10 the `[kernel]` extra resolves to nothing and use_kernel=True raises the friendly ImportError at runtime; the connector's own python floor (3.8) is unchanged. Verified locally (kernel served from a locally-built cp310-abi3 wheel, since the published package isn't yet mirrored on the dev proxy): - pip install "databricks-sql-connector[kernel]" -> connector + kernel + pyarrow all install; use_kernel=True runs a live query end-to-end (backend KernelDatabricksClient). - plain install -> use_kernel=True raises the friendly ImportError. NOTE: `poetry lock` still needs to be run to refresh poetry.lock with the databricks-sql-kernel entry; it is intentionally NOT included here because it requires the kernel to be resolvable on the index poetry/CI use (the JFrog db-pypi proxy). Confirm the package resolves there before merging. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Listing bare `pyarrow` in the [kernel] extra forced poetry to co-resolve an unconstrained pyarrow against the kernel's transitive `pyarrow>=23.0.1,<24` across the connector's full 3.8–3.14 matrix. pyarrow 23.x requires Python >=3.10, so the constraint is unsatisfiable on 3.8/3.9 — `poetry lock` failed every CI job with "version solving failed ... pyarrow is forbidden". The kernel wheel already declares `pyarrow>=23.0.1,<24` as a hard runtime dependency, so `pip install databricks-sql-connector[kernel]` still pulls a compatible pyarrow transitively. The databricks-sql-kernel dep stays gated to python>=3.10, which now correctly excludes the whole kernel+pyarrow subtree from the 3.8/3.9 resolution. The kernel's own metadata is the single source of truth for the pyarrow floor. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…lves
The kernel's transitive pyarrow>=23.0.1,<24 conflicts with the
connector's own pyarrow>=14.0.1 (declared across 3.8–3.13) during
`poetry lock`: pyarrow>=23 dropped Python 3.9, so for the 3.8–3.10
slice poetry can't find a pyarrow satisfying both and version solving
fails ("pyarrow is forbidden" -> "databricks-sql-kernel is forbidden").
The kernel's python>=3.10 marker doesn't help because poetry unifies
the pyarrow constraint across the connector's declared pyarrow band,
not the kernel's.
Split the connector's pyarrow entry at 3.10 and cap the <3.10 band at
<23. This removes no installable version — the newest pyarrow with a
Python 3.9 wheel is 21.x — it just makes that physical fact explicit to
the solver, so the <3.10 band (capped, kernel absent) and the >=3.10
band (where the kernel can pull pyarrow up to <24) no longer overlap.
Verified `poetry lock` resolves the full dependency set with this
change.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Mirrors the "Unit Tests + PyArrow" matrix but for the [kernel] extra. Until now no CI job exercised the published kernel wheel: the base unit-test matrix installs no extras, and the kernel unit tests use a fake databricks_sql_kernel module injected into sys.modules, so the real wheel was never loaded in CI. The new job (Python 3.10–3.14; the wheel is cp310-abi3 so 3.9 is omitted) installs the [kernel] extra via --all-extras, then: - asserts databricks_sql_kernel imports and has a real __file__ (i.e. the published wheel actually installed, not the test fake), and - imports the use_kernel backend path (KernelDatabricksClient / KernelResultSet) against the real wheel, before running the unit suite. This is the only CI signal that the published [kernel] extra installs and loads end to end on every PR (the live use_kernel=True e2e remains in kernel-e2e.yml, merge-queue gated). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…ent skips Ensure every CI job that's meant to cover the kernel actually drives the use_kernel=True path through the REAL databricks-sql-kernel wheel, and fails loudly if it can't (rather than silently skipping / passing on the Thrift path). Problem this fixes: - The kernel unit tests inject a fake databricks_sql_kernel into sys.modules. In a shared `pytest tests/unit tests/e2e` session (the coverage job, which installs --all-extras so the real wheel IS present) that fake shadowed the real wheel, so the kernel e2e tests silently skipped — the coverage job looked like it exercised the kernel but didn't. Changes: - tests/e2e/test_kernel_backend.py + test_kernel_tls.py: replace the silent `__file__`-based skip with a three-state guard keyed on importlib.metadata (the on-disk dist DB, which a sys.modules stub can't fake): skip only when the wheel is genuinely absent; FAIL LOUDLY when it's installed-but-shadowed. The `conn` fixture now also asserts conn.session.backend is KernelDatabricksClient, so a use_kernel=True connection that fell back to Thrift fails the test. - tests/unit/test_session.py: add TestUseKernelRoutesThroughRealWheel (marked `realkernel`) — a no-network proof that sql.connect(use_kernel=True) instantiates the REAL KernelDatabricksClient (mocks only open_session; does not fake the wheel). Skips if the wheel is absent; fails if it's shadowed. - pyproject.toml: register the `realkernel` marker. Tests so marked need an unpolluted sys.modules and must run in a separate pytest invocation from the fake-injecting unit tests. - tests/unit/test_kernel_client.py: document that its session-global fake mandates the separate-invocation rule for real-wheel tests. - code-quality-checks.yml: the Unit Tests + Kernel matrix now asserts the real wheel, runs `tests/unit -m "not realkernel"`, then runs the real-wheel routing test as its own invocation (`pytest tests/unit/test_session.py -m realkernel`). All three unit matrices gained `-m "not realkernel"`. - code-coverage.yml: --ignore the kernel e2e files and add `-m "not realkernel"` so the shared --all-extras session doesn't trip the new loud guards; the real live kernel e2e stays in kernel-e2e.yml (isolated session, real wheel, live warehouse). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
The "Unit Tests + PyArrow" job used --all-extras, which predates the [kernel] extra. Now that [kernel] exists, --all-extras silently also installs the kernel wheel — so that tier no longer isolated the "pyarrow present, kernel absent" configuration and overlapped the new "Unit Tests + Kernel" job. - Unit Tests + PyArrow: --extras pyarrow (pyarrow only; no kernel). - Unit Tests + Kernel: --extras kernel (resolves the published databricks-sql-kernel wheel via the [kernel] extra — the exact edge `pip install databricks-sql-connector[kernel]` uses — which transitively brings pyarrow). Each tier now targets its configuration precisely. The kernel install path here (published wheel via the extra) is intentionally distinct from kernel-e2e.yml, which maturin-builds tip-of-tree at KERNEL_REV. Verified against the proxy: --extras pyarrow installs pyarrow and NOT the kernel; --extras kernel installs databricks-sql-kernel 0.1.2. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
gopalldb
approved these changes
Jun 10, 2026
Bump the [kernel] extra's floor from ^0.1.0 to ^0.2.0 (>=0.2.0,<0.3.0) now that 0.2.0 is published. The <0.3.0 cap is deliberate: the kernel is pre-1.0, so each 0.x minor may be breaking — we bump this when the kernel ships 0.3.0 rather than auto-adopting a potentially-breaking minor. 0.2.0 keeps the same Requires-Python (>=3.10) and pyarrow (>=23.0.1,<24) pin as 0.1.x, so the python>=3.10 marker and the pyarrow <23 sub-3.10 cap are unchanged. Verified `poetry lock` resolves and locks databricks-sql-kernel 0.2.0. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
databricks-sql-kernelis now published to PyPI (and resolvable via thedb-pypiJFrog proxy CI uses), so the Rust kernel backend (use_kernel=True) can ship as an optional dependency instead of a local-dev-only build. This adds the[kernel]extra:pip install "databricks-sql-connector[kernel]"Changes
pyproject.tomldatabricks-sql-kernel = {version = "^0.1.0", optional = true, python = ">=3.10"}. Gated to Python ≥ 3.10 because the wheel iscp310-abi3(Requires-Python: >=3.10). The connector's own floor (3.8) is unchanged; on 3.8/3.9 the extra resolves to nothing.kernel = ["databricks-sql-kernel", "pyarrow"]. pyarrow is listed explicitly: the kernel result path (backend/kernel/result_set.py) imports it unconditionally to wrap the Arrow batches the kernel returns. It's already pulled transitively via the kernel wheel'spyarrow>=23.0.1,<24, but naming it documents the connector-side requirement and lets pip co-resolve both constraints at install time.backend/kernel/_errors.py— theuse_kernel=TrueImportErrornow points atpip install "databricks-sql-connector[kernel]"and notes Python ≥ 3.10 (was the obsolete "not yet published, build locally" hint).README.md— document the extra, usage, and the Python ≥ 3.10 / pyarrow notes.How it works
use_kernel=Trueimportsdatabricks_sql_kernelat backend load. With the extra → import succeeds, kernel path is live. Without it → friendlyImportErrortelling the user to install[kernel]. The kernel stays a soft dependency: default installs don't pull the Rust wheel.Verification (all green, against the published wheel)
Installed
databricks-sql-connector[kernel]from this source with the kernel resolved purely from the proxy (no local wheel):databricks-sql-kernel 0.1.2+pyarrow 23.0.1;use_kernel=Trueruns a live query end-to-end (backendKernelDatabricksClient, results 42 and 99 in two test venvs).pip install databricks-sql-connector→use_kernel=Trueraises the friendlyImportError(mentions[kernel]+ Python ≥ 3.10).scripts/dependency_manager.py(CI's unit-test dep generator) correctly excludes the optional kernel dep from the generated requirements — verified default and--include-optionalruns don't emit it, so no install attempt on the 3.8/3.9 test matrix.poetry.lock
Not regenerated in this commit: my environment's poetry can only reach public PyPI (DNS-blocked here), so I can't produce a correct lock locally. CI's
setup-poetryaction runspoetry lockitself against thedb-pypiJFrog source on every run (andpoetry install --all-extras), so the lockfile is regenerated in CI with the kernel entry. If a committed-and-consistentpoetry.lockis required by repo policy, please runpoetry lockon a network-capable machine and add it to this branch.Follow-up (optional, not blocking)
Consider a CI job that installs the
[kernel]extra and smoke-testsimport databricks_sql_kernel(or extendkernel-e2e.yml, which currently source-builds atKERNEL_REV) so the published-wheel path is exercised in CI. Note: if such a job usesdependency_manager.py --include-optional, the script's dict-constraint branch ignores thepythonmarker — handle the ≥3.10 gate there to avoid a 3.8/3.9 failure.