Design: cuda.core Texture/Surface API surface 

## Purpose

Design discussion for the **texture / surface API** in `cuda.core` — to settle the API shape and
naming *before* code review of the implementation. Reviewers asked for design sign-off in an issue
before we commit to a ~9k-line feature.

- Implementation PR: #2095
- Feature request: #467

cc @leofang @mdboom @Andy-Jost @kkraus14 — you asked for a design pass; this is the home for it.

## Proposed public surface (from #2095)

- `Array` + `ArrayFormat` — opaque, hardware-laid-out GPU allocations backing textures/surfaces.
- `MipmappedArray` — wraps `CUmipmappedArray`; `get_level` returns a non-owning `Array` kept alive
  by a strong ref to the parent.
- `TextureObject` + `TextureDescriptor` — bindless texture handle + sampling state.
- `SurfaceObject` — bindless surface handle; requires `Array(surface_load_store=True)`.
- `ResourceDescriptor` — factories `from_array`, `from_mipmapped_array`, `from_linear`, `from_pitch2d`.

## Decisions to make

1. **Name of `Array`. ✅ Decided — rename `Array` → `CUDAArray`.**

   This type is an opaque `cudaArray_t` — the GPU stores it in a scrambled, hardware-defined layout
   with no linear pointer, so it **cannot** expose `__cuda_array_interface__` / DLPack and cannot
   share memory zero-copy with cupy / numba-cuda / torch. The name `Array` implies an n-dimensional
   array that participates in that ecosystem — it can't. CuPy names the identical type `CUDAarray`,
   and its whole `cupy.cuda.texture` module already matches this PR's surface 1:1.

   **Resolution: use `CUDAArray`** — the PEP 8 CapWords form (deliberately differing from CuPy's exact
   `CUDAarray` casing to follow Python's class-naming standard). The name signals "CUDA texture/surface
   backing store," not "n-dimensional array."

   **Open detail resolved: keep `ArrayFormat` (do *not* rename to `CUDAArrayFormat`).** The sibling
   enums in these modules — `AddressMode`, `FilterMode`, `ReadMode` — are all unprefixed, so
   `ArrayFormat` matches the established enum-naming pattern; and the "`Array` implies an
   ndarray/DLPack participant" concern that motivated `CUDAArray` does not apply to a format enum
   (nobody mistakes `ArrayFormat` for an n-dimensional array).

2. **Interop path. ✅ Decided — ship only `copy_from` / `copy_to`; no allocation helper.**

   Zero-copy is impossible (opaque layout, no linear pointer), so copying is the only option —
   this was purely about how polished the path is. The copy path to/from linear `cuda.core`
   `Buffer`s already exists: `copy_from` / `copy_to` accept a device `Buffer` or a host
   buffer-protocol object, in both directions. The only thing an extra helper would add is
   allocating the linear `Buffer` for the caller — folding `mr.allocate(arr.size_bytes, stream=s)`
   + `arr.copy_to(buf, stream=s)` into a one-liner, i.e. ~2 lines of convenience.

   **Resolution: ship `copy_from` / `copy_to` only, and document the copy-only contract.** We will
   not add an allocating convenience helper now. It is purely additive and non-breaking, so we can
   add one later if users request it.

3. **Factory set. ✅ Not a real decision — driver-mandated, all four required.**

   A texture can be backed by four kinds of memory — the PR exposes one factory per kind:
   - `from_array` — texture over a `CUDAArray` *(the headline feature)*
   - `from_mipmapped_array` — texture over a `MipmappedArray` *(the headline feature)*
   - `from_linear` — texture over a plain 1D device buffer *(ordinary linear memory, no `CUDAArray`)*
   - `from_pitch2d` — texture over a plain 2D pitched buffer *(ordinary linear memory, no `CUDAArray`)*

   `ResourceDescriptor` binds `CUDA_RESOURCE_DESC`, a driver union whose `resType` is exactly one of
   `ARRAY` / `MIPMAPPED_ARRAY` / `LINEAR` / `PITCH2D` — one factory per union arm. A faithful binding
   of that type *must* cover all four; shipping only two would be an incomplete binding of a mandatory
   driver struct, not a smaller-but-valid surface. So there was no real optionality here — the CTK
   driver API dictates the set. (Listed only because it sat next to the genuine decisions.)

   **Resolution: ship all four factories — required by the driver API, not a tradeoff.**

4. **Channel format. ✅ Decided — keep the folded `format` + `num_channels` parameters.**

   Each array element has a component type (e.g. 8-bit uint, 32-bit float) and a channel count
   (1 = grayscale … 4 = RGBA). Two ways to surface that:
   - **Folded (this PR):** `CUDAArray.from_descriptor(shape=..., format=ArrayFormat.FLOAT32, num_channels=4)`
   - **Separate (CuPy):** one `ChannelFormatDescriptor(...)` object passed as a unit

   The driver descriptor `cuda.core` actually fills in (`CUDA_ARRAY3D_DESCRIPTOR`) already stores
   these as two separate fields — `Format` (a `CUarray_format`, mirrored 1:1 by `ArrayFormat`) and
   `NumChannels`. So the folded form maps straight onto the driver struct with no translation, and
   read-back is already exposed as two properties (`.format`, `.num_channels`). The bundled
   `ChannelFormatDescriptor` is the *runtime* API's (`cudaChannelFormatDesc`) modeling — the form
   CuPy wraps because its texture module sits on the runtime API. Adopting it in a driver-based
   library would mean a translation wrapper the underlying API doesn't use (and the shapes don't even
   map cleanly: the driver uses one uniform component format × channel count, while
   `cudaChannelFormatDesc` allows per-channel bit widths).

   **Resolution: keep folded `format` + `num_channels`.** It's the driver-faithful surface
   (consistent with #1 favoring correctness over CuPy parity and #3 following the driver API); the
   bundled form's only wins are CuPy look-alike and a single read-back object, neither worth a
   runtime-style wrapper here.

5. **Descriptor type consistency. ✅ Not a real decision — divergence is intentional and harmless.**

   *Note: the original framing here was factually wrong.* `ResourceDescriptor` is **not** a `cdef
   class` and holds **no** native C struct — it is a plain Python class with `__slots__`, storing a
   reference to the backing resource plus a few Python fields. The `CUDA_RESOURCE_DESC` struct is
   assembled later, in `TextureObject.from_descriptor`. So this was never a `@dataclass`-vs-`cdef
   class` / performance question. Both descriptors are pure Python.

   The genuine difference is only *how you construct each*, and it reflects what each type is:
   - `TextureDescriptor` — a flat bag of independent sampling settings, built directly with keyword
     args (`@dataclass` fits perfectly).
   - `ResourceDescriptor` — a "pick exactly one of four backings" union (array / mipmap / linear /
     pitch2d), built via `from_*` factories because each kind carries different fields. A single
     `__init__` would be a pile of mutually-exclusive optional args plus a kind tag.

   Consistency is not a goal in itself — it only matters when inconsistency makes the API harder to
   learn or use, and here it doesn't: a user learns each type once and never has to reconcile them.
   The only behavioral gap is equality (`TextureDescriptor` compares by value; `ResourceDescriptor`
   by identity), which is essentially never exercised on these objects and is arguably correct since
   `ResourceDescriptor` wraps a live device resource. Forcing both to the same *kind* of type would
   be uniformity for its own sake and would make `ResourceDescriptor`'s constructor worse.

   **Resolution: keep the split — the divergence is intentional and does not hurt usability.** Like
   #3, this resolves to a non-issue once examined (and on a mistaken premise to begin with).

6. **Bool naming. ✅ Decided — adopt the `is_<something>` convention.**

   `surface_load_store` is a boolean on `Array`: it records whether the array was created with the
   surface load/store capability (CUDA's `CUDA_ARRAY3D_SURFACE_LDST`), which a `SurfaceObject`
   requires. Exposed both as a constructor keyword (`surface_load_store=True`) and a read-only
   property (`arr.surface_load_store`).

   The repo convention for boolean properties is `is_<something>`, so a property named
   `surface_load_store` doesn't read as a boolean the way `arr.is_managed` does. **Resolution: rename
   the property to follow the `is_<x>` convention (e.g. `is_surface_load_store`) for consistency with
   the cuda-python codebase.**

   **Open detail resolved:** the property name is `is_surface_load_store` (already implemented), and
   the constructor keyword is renamed to match — `from_descriptor(..., is_surface_load_store=False)` —
   so one symmetric name serves both set and read-back. This follows the existing
   `StridedMemoryView(is_readonly=...)` precedent in cuda.core, where an `is_<x>` boolean is used as
   both the constructor argument and the attribute. (The keyword rename is a small implementation
   follow-up in the PR; the property is already done.)

7. **Scope. ✅ Decided — split the examples into a follow-up PR.**

   The nine `gl_interop_*.py` examples (~5k lines, not CI-wired, need a GL context CI lacks) are
   orthogonal to the core API. **Resolution: drop them from this PR and land them in a separate
   follow-up PR once this core texture/surface PR merges**, since the examples depend on the new API
   it introduces.









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: cuda.core Texture/Surface API surface #2188

Purpose

Proposed public surface (from #2095)

Decisions to make

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Design: cuda.core Texture/Surface API surface #2188

Description

Purpose

Proposed public surface (from #2095)

Decisions to make

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions