Add Gaussian Mixture Models (GMM) by Intron7 · Pull Request #2248 · NVIDIA/cuvs

Intron7 · 2026-06-17T17:21:03Z

This adds GMM to cuvs. This a pure fp32/fp64 cuda-core version. It already implements some of the memory advance from Flash-GMM.

copy-pr-bot · 2026-06-17T17:21:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-17T17:32:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: de27e12a-cfc4-4224-8b7f-c57b6eb5a64e

📥 Commits

Reviewing files that changed from the base of the PR and between 22be5a4 and e893ca8.

📒 Files selected for processing (4)

c/tests/cluster/gmm_c.cu
cpp/src/cluster/gmm_impl.cuh
fern/pages/python_api/python-api-cluster-gmm.md
python/cuvs/cuvs/cluster/gmm/gmm.pyx

✅ Files skipped from review due to trivial changes (1)

fern/pages/python_api/python-api-cluster-gmm.md

🚧 Files skipped from review as they are similar to previous changes (3)

c/tests/cluster/gmm_c.cu
python/cuvs/cuvs/cluster/gmm/gmm.pyx
cpp/src/cluster/gmm_impl.cuh

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added Gaussian Mixture Model (GMM) clustering for C, C++, and Python.
- Provides fit, predict, predict_proba, and score_samples.
- Supports covariance types: full, tied, diagonal, and spherical.
- Supports initialization methods: k-means, k-means++, random, and random-from-data, plus warm-start for continued EM runs.
Documentation
- Added guides and API reference pages for C/C++/Python GMM.
Tests
- Added C/C++/Python test coverage for correctness, convergence, and output normalization/probability behavior.

Walkthrough

This PR adds Gaussian Mixture Model support across the C++ core, C and Python bindings, tests, and Fern documentation.

Changes

GMM Clustering Feature

Layer / File(s)	Summary
C++ public API and build wiring `cpp/include/cuvs/cluster/gmm.hpp`, `cpp/CMakeLists.txt`	Defines the public GMM enums, params, and templated API declarations, and adds the GMM CUDA sources to the object library.
CUDA kernels and C++ EM implementation `cpp/src/cluster/gmm_kernels.cuh`, `cpp/src/cluster/gmm_impl.cuh`, `cpp/src/cluster/gmm.cuh`	Implements the GMM kernels, EM training and inference logic, and public validation wrappers.
C++ explicit instantiations `cpp/src/cluster/gmm_float.cu`, `cpp/src/cluster/gmm_double.cu`	Adds explicit float and double instantiations for the GMM API entry points.
C API header, implementation, and build wiring `c/include/cuvs/cluster/gmm.h`, `c/include/cuvs/core/all.h`, `c/src/cluster/gmm.cpp`, `c/CMakeLists.txt`	Adds the C GMM API, DLPack-to-raft dispatch, and C build integration.
Python/Cython bindings and package wiring `python/cuvs/cuvs/cluster/gmm/*`, `python/cuvs/cuvs/cluster/CMakeLists.txt`, `python/cuvs/cuvs/cluster/__init__.py`	Adds the Python GMM module, Cython declarations, package exports, and build wiring.
Tests `cpp/tests/cluster/gmm.cu`, `cpp/tests/CMakeLists.txt`, `c/tests/cluster/gmm_c.cu`, `c/tests/CMakeLists.txt`, `python/cuvs/cuvs/tests/test_gmm.py`	Adds C++, C, and Python tests for fit, predict, responsibilities, score samples, initialization, warm start, and covariance failures.
Fern documentation `fern/docs.yml`, `fern/pages/cluster/gmm.md`, `fern/pages/c_api/`, `fern/pages/cpp_api/`, `fern/pages/python_api/*`	Adds the GMM guide, API reference pages, and navigation entries.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

feature request, C++, cpp

Suggested reviewers

cjnolet
tarang-jain
lowener

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding Gaussian Mixture Models support.
Description check	✅ Passed	The description is related to the changeset and accurately mentions the GMM CUDA-core implementation and memory improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 markdownlint-cli2 (0.22.1)

fern/pages/python_api/python-api-cluster-gmm.md

markdownlint-cli2 v0.22.1 (markdownlint v0.40.0)
Error: Unable to use configuration file '/coderabbit-0.markdownlint-cli2.jsonc'; ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at throwForConfigurationFile (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:48:9)
at readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:169:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
[cause]: Error: ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at async open (node:internal/fs/promises:640:25)
at async Object.readFile (node:internal/fs/promises:1287:14)
at async readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:141:17)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/coderabbit-0.markdownlint-cli2.jsonc'
}
}

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@c/tests/cluster/gmm_c.cu`:
- Around line 45-64: The cuvsResources_t resource created by
cuvsResourcesCreate(&res) uses a different CUDA stream by default than the
stream used for the asynchronous raft::copy operations, creating a race
condition where cuvsGMMFit executes before the data copy completes. After
calling cuvsResourcesCreate(&res), immediately call cuvsStreamSet(res, stream)
to synchronize the resource's stream with the stream variable that is used by
raft::copy and the subsequent GPU operations. Apply this fix immediately
following the cuvsResourcesCreate call, and also apply the same fix pattern to
the other test sections mentioned in the comment (lines 98-111, 167-187,
216-229).

In `@cpp/src/cluster/gmm_impl.cuh`:
- Around line 1115-1157: The FULL-covariance second pass has inconsistent
responsibilities across its computational steps. The issue is that after
m_finalize mutates weights and means at line 1141, the subsequent estep_tile
call at line 1145 recomputes responsibilities with these updated parameters, but
then m_cov_full_pass uses these new responsibilities which differ from those
used in the first pass and in m_accumulate, breaking the mathematical
consistency of the EM algorithm. Fix this by saving the responsibilities from
the first pass before m_finalize is called, and then reusing those saved
responsibilities in the m_cov_full_pass loop instead of recomputing them with
estep_tile.

In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx`:
- Around line 90-94: The `__cinit__` method in the GMM class does not check the
return value from `cuvsGMMParamsCreate`, which can leave `self.params`
uninitialized if the call fails, leading to undefined behavior. Add error
checking after the `cuvsGMMParamsCreate` call to verify it succeeded and raise
an exception if it fails. Additionally, the `__dealloc__` method currently calls
`check_cuvs()` on `cuvsGMMParamsDestroy`, which can raise exceptions in a
destructor (problematic behavior). Instead, check the return value from
`cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or
handle the error silently to avoid suppressed exceptions or ResourceWarnings.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca5bb572-978f-46af-87e8-00b27bd7a8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 9ae6f93 and d4175d9.

📒 Files selected for processing (32)

c/CMakeLists.txt
c/include/cuvs/cluster/gmm.h
c/include/cuvs/core/all.h
c/src/cluster/gmm.cpp
c/tests/CMakeLists.txt
c/tests/cluster/gmm_c.cu
cpp/CMakeLists.txt
cpp/include/cuvs/cluster/gmm.hpp
cpp/src/cluster/gmm.cuh
cpp/src/cluster/gmm_double.cu
cpp/src/cluster/gmm_float.cu
cpp/src/cluster/gmm_impl.cuh
cpp/src/cluster/gmm_kernels.cuh
cpp/tests/CMakeLists.txt
cpp/tests/cluster/gmm.cu
fern/docs.yml
fern/pages/c_api/c-api-cluster-gmm.md
fern/pages/c_api/index.md
fern/pages/cluster/gmm.md
fern/pages/cluster/index.md
fern/pages/cpp_api/cpp-api-cluster-gmm.md
fern/pages/cpp_api/index.md
fern/pages/python_api/index.md
fern/pages/python_api/python-api-cluster-gmm.md
python/cuvs/cuvs/cluster/CMakeLists.txt
python/cuvs/cuvs/cluster/__init__.py
python/cuvs/cuvs/cluster/gmm/CMakeLists.txt
python/cuvs/cuvs/cluster/gmm/__init__.pxd
python/cuvs/cuvs/cluster/gmm/__init__.py
python/cuvs/cuvs/cluster/gmm/gmm.pxd
python/cuvs/cuvs/cluster/gmm/gmm.pyx
python/cuvs/cuvs/tests/test_gmm.py

coderabbitai · 2026-06-17T17:32:44Z

+    def __cinit__(self):
+        cuvsGMMParamsCreate(&self.params)
+
+    def __dealloc__(self):
+        check_cuvs(cuvsGMMParamsDestroy(self.params))


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CRITICAL: Unchecked error in __cinit__ and unsafe exception in __dealloc__

Two related issues in the lifecycle methods:

Line 91: cuvsGMMParamsCreate return value is not checked. If allocation fails, self.params remains uninitialized (undefined behavior), and subsequent operations will silently corrupt memory or crash.

Line 94: check_cuvs() can raise exceptions in __dealloc__, which is problematic:

If __cinit__ failed silently, self.params may be invalid/garbage

Raising exceptions in destructors can cause suppressed errors or ResourceWarnings

Proposed fix

def __cinit__(self): - cuvsGMMParamsCreate(&self.params) + check_cuvs(cuvsGMMParamsCreate(&self.params)) def __dealloc__(self): - check_cuvs(cuvsGMMParamsDestroy(self.params)) + if self.params is not NULL: + cuvsGMMParamsDestroy(self.params)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx` around lines 90 - 94, The `__cinit__` method in the GMM class does not check the return value from `cuvsGMMParamsCreate`, which can leave `self.params` uninitialized if the call fails, leading to undefined behavior. Add error checking after the `cuvsGMMParamsCreate` call to verify it succeeded and raise an exception if it fails. Additionally, the `__dealloc__` method currently calls `check_cuvs()` on `cuvsGMMParamsDestroy`, which can raise exceptions in a destructor (problematic behavior). Instead, check the return value from `cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or handle the error silently to avoid suppressed exceptions or ResourceWarnings.

cjnolet · 2026-06-23T15:52:35Z

/ok to test d4175d9

coderabbitai · 2026-06-24T14:54:54Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

Intron7 added 4 commits June 16, 2026 14:16

add gmm

72d8c5c

update kernels and fusing

fb65700

update refs

c4a016d

make clean

d4175d9

Intron7 requested review from a team as code owners June 17, 2026 17:21

github-project-automation Bot added this to Unstructured Data Processing Jun 17, 2026

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

cjnolet assigned Intron7 Jun 23, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 23, 2026

cjnolet moved this to In Progress in Unstructured Data Processing Jun 23, 2026

Intron7 and others added 2 commits June 24, 2026 14:58

Merge branch 'main' into fea-gmm

22be5a4

fix issues and adress coderabbit

cccdfc0

Merge branch 'main' into fea-gmm

e893ca8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Gaussian Mixture Models (GMM)#2248

Add Gaussian Mixture Models (GMM)#2248
Intron7 wants to merge 7 commits into
NVIDIA:mainfrom
Intron7:fea-gmm

Intron7 commented Jun 17, 2026

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

cjnolet commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Intron7 commented Jun 17, 2026

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading