Materialize HNSW Layered Index to hnswlib Index by julianmi · Pull Request #2241 · NVIDIA/cuvs

julianmi · 2026-06-12T13:17:28Z

#2148 added a HNSW layered index that compacts the HNSW index by stripping out the dataset. This can be especially beneficial if the index is built on an accelerated machine and moved over a network to a search node. This proposal adds cuvs::neighbors::hnsw::materialize_to_hnswlib, a file-to-file API that converts a GPU_LAYERED_ON_DISK artifact (graph topology only, stored in ACE/build order) plus its original-ID-ordered dataset into a standard hnswlib index file.

Public API (`cpp/include/cuvs/neighbors/hnsw.hpp`)

I don't think there is a similar API in cuVS nor hnswlib. Thus, I've aligned the naming with the existing serialize_to_hnswlib and used materialize to clarify that the result is a hnswlib index file. I'm open for other naming suggestions.

struct materialize_params {
  std::string dataset_path;          // original-ID-ordered vectors (.npy or ANN *.bin)
  double      max_host_memory_gb = 0; // 0 => single in-memory reorder; >0 => bucketed temp files
  int         num_threads        = 0; // 0 => max threads
};

void materialize_to_hnswlib(raft::resources const& res,
                            const materialize_params& params,
                            const std::string& layered_artifact_path,
                            const std::string& output_path,
                            int dim,
                            cuvs::distance::DistanceType metric);

A single entry point covers all dtypes: the element type (float, half, uint8_t, int8_t) is
read from the artifact header and dispatched internally.

Implementation (`cpp/src/neighbors/detail/hnsw.hpp`)

The goal is to minimize the random access necessary for the ID reordering. The random access in main memory is cheap but severe for disk I/O. Thus, two passes are used:

Phase 0 – setup: validate header/descriptors/levels, validate dataset shape, derive the hnswlib
layout constants from a dummy single-element index, compute the output layout, posix_fallocate
the output, and write the native hnswlib header.
Phase 1/2 – base region: reorder base links from ACE order to ID order and interleave dataset
vectors, emitting [level-0 link block | vector | label] sequentially. Two strategies:
- single in-memory scatter when the base topology fits max_host_memory_gb;
- bucketed temporary files (id_record_spiller) otherwise — sequential append + sequential replay
  per bucket, peak RAM bounded near the budget.
Phase 3 – upper region: transpose the upper layers into per-element link lists and append them
sequentially, with the same in-memory / bucketed fallback.

The output is the exact upstream layout (header + level-0 array + per-element link lists), loadable
by hnswlib::loadIndex and by cuvs::neighbors::hnsw::deserialize with hierarchy == CPU.

Details

Peak host RAM ≈ one reorder bucket (base_links_bytes / num_buckets) + small (~64 MiB) streaming
buffers + levels (n bytes) + the upper locator. With the in-memory pass, RAM ≈ the base topology
only and a budget lowers it further.
Disk IO is fully sequential: dataset read once, output written once, base topology read once
(in-memory) or read once + temp write/read once (bucketed), and upper read once + written once.
All four dtypes are tested in cpp/tests/neighbors/ann_hnsw_ace*.
The example in examples/cpp/src/hnsw_ace_layered_example.cu was updated to use the materialization path.
Adds the C, Java, and Python bindings.

- Add base node IDs for sequential access. - Scattered writes happen only in deserialization step using host memory.

- `materialize_to_hnswlib` enables disk-to-disk materialization of a layered HNSW artifact to a standard hnswlib index fiel. - Adds bindings

copy-pr-bot · 2026-06-12T13:17:31Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

julianmi added 12 commits June 1, 2026 10:15

Add HNSW layered hierarchy

52997da

Improve deserialization logging

a11d9ee

Use ace prefix in benchmarking consistently

a95b0e0

Validate metadata before allocating

47e2d85

Store layered base topology by original node ID

21ce339

- Add base node IDs for sequential access. - Scattered writes happen only in deserialization step using host memory.

Unify the ACE logging format

4514bc8

Merge branch 'main' into hnsw-layered-index

0e9accd

Address review feedback

7a761cf

Replace JSON header with binary header

ea1a96c

Merge branch 'main' into hnsw-layered-index

153a82d

Merge branch 'main' into hnsw-layered-index

265206b

Add GPU_LAYERED_ON_DISK support for HNSW materialization

ad3848a

- `materialize_to_hnswlib` enables disk-to-disk materialization of a layered HNSW artifact to a standard hnswlib index fiel. - Adds bindings

github-project-automation Bot added this to Unstructured Data Processing Jun 12, 2026

julianmi mentioned this pull request Jun 12, 2026

Add HNSW Layered Index Support #2148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Materialize HNSW Layered Index to hnswlib Index#2241

Materialize HNSW Layered Index to hnswlib Index#2241
julianmi wants to merge 12 commits into
NVIDIA:mainfrom
julianmi:materialize-layered-index

julianmi commented Jun 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

julianmi commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Public API (cpp/include/cuvs/neighbors/hnsw.hpp)

Implementation (cpp/src/neighbors/detail/hnsw.hpp)

Details

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

julianmi commented Jun 12, 2026 •

edited

Loading

Public API (`cpp/include/cuvs/neighbors/hnsw.hpp`)

Implementation (`cpp/src/neighbors/detail/hnsw.hpp`)