Skip to content

Use KvikIO for CAGRA-ACE#2257

Open
julianmi wants to merge 7 commits into
NVIDIA:mainfrom
julianmi:ace-kvikio
Open

Use KvikIO for CAGRA-ACE#2257
julianmi wants to merge 7 commits into
NVIDIA:mainfrom
julianmi:ace-kvikio

Conversation

@julianmi

@julianmi julianmi commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Motivation:

  • Performance: CAGRA-ACE uses intermediate files when the dataset, graph, and intermediate objects do not fit into memory. Using direct file I/O and GPUDirect Storage (GDS) improves I/O throughput. I observed a 1.17x end-to-end speedup for BIGANN-1B on L40S.
  • Reduce memory pressure: IVF-PQ uses pinned memory in k-means through raft::sample_rows in each partition. ACE uses large files leading to high page-cache pressure and memory fragmentation. Direct file I/O skips the page cache and resolves the potentially failing pinned memory allocations.

Changes:

  • Add KvikIO-backed bulk I/O for ACE/HNSW disk paths.
  • Use KvikIO futures for concurrent reads and explicit async result handling.
  • Align ACE .npy data regions for direct I/O and GDS-friendly transfers.
  • Harden file I/O helpers with better validation, ownership, and short-read/write checks.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 58839c96-47ac-458f-8930-fe78ab0acb37

📥 Commits

Reviewing files that changed from the base of the PR and between 66f1fe1 and 46b5cf2.

📒 Files selected for processing (1)
  • c/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • c/CMakeLists.txt

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added KvikIO-backed large-file I/O with improved aligned NumPy on-disk layout and disk-oriented reads/writes.
    • Updated disk-mode CAGRA/HNSW loading to use KvikIO file handles for bulk transfers, with direct-to-device paths when possible.
  • Bug Fixes
    • Improved validation for file/NumPy header integrity, alignment, and full read/write completion.
  • Tests
    • Added end-to-end tests covering large-file round-trips and KvikIO output correctness, including failure cases for replaced/removed paths.
  • Chores
    • Updated Conda/Python/CI build dependencies and release tooling for KvikIO (including wheel auditwheel exclusions and static build wiring).

Walkthrough

Integrates KvikIO into cuVS build, runtime, file I/O, neighbor disk workflows, tests, and packaging. Large-file I/O now uses kvikio handles, CAGRA and HNSW disk paths use kvikio-based reads and writes, and dependency declarations are updated across CMake, conda, Python, and release tooling.

Changes

KvikIO GPU Direct Storage Integration

Layer / File(s) Summary
KvikIO CMake dependency and target linking
cpp/cmake/thirdparty/get_kvikio.cmake, cpp/CMakeLists.txt, c/CMakeLists.txt
Adds KvikIO fetch/configuration logic and links kvikio::kvikio into cuVS CMake targets, including the C library target when the local KvikIO target exists.
file_io.hpp alignment and stream declarations
cpp/include/cuvs/util/file_io.hpp
Adds NumPy alignment helpers, refactors fd_streambuf lifecycle handling, updates create_numpy_file layout logic, and expands the file I/O stream declarations and documentation.
kvikio-backed bulk file I/O
cpp/src/util/file_io.cpp
Replaces chunked POSIX large-file reads and writes with kvikio FileHandle calls, adds file-identity validation helpers, and implements kvikio_ofstream buffering and flush behavior.
CAGRA disk partition loading
cpp/src/neighbors/detail/cagra/cagra_build.cuh
Changes ACE partition loading to use pre-opened file descriptors and kvikio pread futures, then updates disk-mode sub-index construction to use a direct-device read path with host fallback and timing logs.
HNSW disk serialization and reads
cpp/src/neighbors/detail/hnsw.hpp
Opens kvikio FileHandle objects for HNSW disk inputs, replaces batch reads with concurrent kvikio pread calls and validation, and writes spill output through kvikio_ofstream.
file_io test coverage
cpp/tests/util/file_io_test.cpp, cpp/tests/CMakeLists.txt
Adds tests for aligned NumPy file creation, host and device round-trips, kvikio_ofstream behavior, invalid arguments, replaced-path handling, and registers the new test target in CMake.
Packaging and runtime dependency wiring
conda/environments/all_cuda-*, conda/recipes/libcuvs/recipe.yaml, dependencies.yaml, python/libcuvs/pyproject.toml, ci/release/update-version.sh, python/libcuvs/libcuvs/load.py, ci/build_wheel.sh
Adds libkvikio to conda environments, libcuvs recipe requirements, dependency matrix groups, Python build/runtime dependencies, version update scripts, wheel repair exclusions, and Python load order.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

C++

Suggested reviewers

  • cjnolet
  • robertmaynard
  • jameslamb
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: adding KvikIO support for CAGRA-ACE disk I/O paths.
Description check ✅ Passed The description is directly related to the changeset and summarizes the KvikIO, GDS, and I/O hardening work.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/neighbors/detail/hnsw.hpp`:
- Around line 840-846: The code calls get() on graph_future, dataset_future, and
label_future sequentially, but if graph_future.get() throws an exception, the
subsequent calls to dataset_future.get() and label_future.get() are never
executed, leaving those async operations unresolved in the background. To fix
this, ensure all three futures are awaited before any exception can propagate by
either using try-catch blocks to drain all futures before re-throwing, or by
restructuring the code to call all three get() methods within a context that
guarantees all are executed (such as a scoped guard or utility function that
ensures all futures are consumed).

In `@cpp/src/util/file_io.cpp`:
- Around line 26-38: The bulk I/O operations are reopening files by path using
kvikio::FileHandle instead of preserving the already-open file descriptor, which
creates a TOCTOU race condition where the path could be replaced between open
and I/O. Fix this by either using the existing file descriptor directly instead
of reopening via path, or by verifying that the inode and device have not
changed before proceeding with the I/O operation. This change breaks the API
contract for descriptor-only usage (where get_path() is unavailable), so add
proper deprecation warnings at the affected call sites (around line 26-38 and
line 49-61 in file_io.cpp) and update the migration guide to document the
required changes for users relying on descriptor-only semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 300f4901-414b-474c-924a-8a674e657be5

📥 Commits

Reviewing files that changed from the base of the PR and between 71306ec and 18d59e5.

📒 Files selected for processing (24)
  • conda/environments/all_cuda-129_arch-aarch64.yaml
  • conda/environments/all_cuda-129_arch-x86_64.yaml
  • conda/environments/all_cuda-133_arch-aarch64.yaml
  • conda/environments/all_cuda-133_arch-x86_64.yaml
  • conda/environments/bench_ann_cuda-129_arch-x86_64.yaml
  • conda/environments/bench_ann_cuda-133_arch-aarch64.yaml
  • conda/environments/bench_ann_cuda-133_arch-x86_64.yaml
  • conda/environments/go_cuda-129_arch-x86_64.yaml
  • conda/environments/go_cuda-133_arch-aarch64.yaml
  • conda/environments/go_cuda-133_arch-x86_64.yaml
  • conda/environments/rust_cuda-129_arch-x86_64.yaml
  • conda/environments/rust_cuda-133_arch-aarch64.yaml
  • conda/environments/rust_cuda-133_arch-x86_64.yaml
  • conda/recipes/libcuvs/recipe.yaml
  • cpp/CMakeLists.txt
  • cpp/cmake/thirdparty/get_kvikio.cmake
  • cpp/include/cuvs/util/file_io.hpp
  • cpp/src/neighbors/detail/cagra/cagra_build.cuh
  • cpp/src/neighbors/detail/hnsw.hpp
  • cpp/src/util/file_io.cpp
  • cpp/tests/CMakeLists.txt
  • cpp/tests/util/file_io_test.cpp
  • dependencies.yaml
  • python/libcuvs/pyproject.toml

Comment thread cpp/src/neighbors/detail/hnsw.hpp Outdated
Comment thread cpp/src/util/file_io.cpp

@jameslamb jameslamb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few small initial comments from a packaging-codeowners perspective. In general, not opposed to having kvikio be a dependency of cuvs if cuVS maintainers are ok with it.

I'd be happy to review again once cuVS maintainers have more thoroughly gone through this and given their opinions.

Comment thread conda/recipes/libcuvs/recipe.yaml Outdated
# CUDA 12 on x86_64 only.
- if: cuda_major == "13" or linux64
then:
- libcufile-dev

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to depend on libcufile unconditionally. RAPIDS moved its CUDA 12 floor to CUDA 12.2 specifically for that.

libkvikio does that:

https://github.com/rapidsai/kvikio/blob/bdb788f45ef191384a294ecef3312ea2db35a2c7/conda/recipes/libkvikio/recipe.yaml#L74

Can you please explain why this limitation was added? If it's leftover from older RAPIDS references, please remove these guards and comments throughout this PR and make this an unconditional dependency.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also may be totally unnecessary for this recipe... I don't see any direct uses of cufile headers in this PR. If this was just to satisfy libkvikio's transitive dependency on cuFile, leave it out... libkvikio conda packages already carry a runtime dependency on libcufile-dev.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've missed that this was fixed in CUDA 12.2. I've also removed the outdated libcufile-dev dependency.

requires-python = ">=3.11"
dependencies = [
"cuda-toolkit[cublas,curand,cusolver,cusparse,nvrtc]==13.*",
"libkvikio==26.8.*,>=0.0.0a0",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAPIDS wheels runtime dlopen() their RAPIDS shared-library dependency.

Add libkvikio here:

# These libraries must be loaded before libcuvs because libcuvs
# references their symbols
import libraft
import librmm
librmm.load_library()
libraft.load_library()

I don't believe rmm or raft depend on it or that it depends on either of them so it can be in any order (I think).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied. I've added it first in case rmm or raft might use kvikio in the future.

build-backend = "scikit_build_core.build"
requires = [
"cmake>=4.0",
"libkvikio==26.8.*,>=0.0.0a0",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the script we use when cutting new release branches: https://github.com/rapidsai/cuvs/blob/71306ec8e76f6c104768cfaba301ca2dcba780bd/ci/release/update-version.sh#L107

It now needs libkvikio entries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 23, 2026
@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing Jun 23, 2026
@julianmi julianmi requested review from a team as code owners June 24, 2026 08:30
@julianmi julianmi requested a review from bdice June 24, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants