Skip to content

Add Gaussian Mixture Models (GMM)#2248

Open
Intron7 wants to merge 7 commits into
NVIDIA:mainfrom
Intron7:fea-gmm
Open

Add Gaussian Mixture Models (GMM)#2248
Intron7 wants to merge 7 commits into
NVIDIA:mainfrom
Intron7:fea-gmm

Conversation

@Intron7

@Intron7 Intron7 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This adds GMM to cuvs. This a pure fp32/fp64 cuda-core version. It already implements some of the memory advance from Flash-GMM.

@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: de27e12a-cfc4-4224-8b7f-c57b6eb5a64e

📥 Commits

Reviewing files that changed from the base of the PR and between 22be5a4 and e893ca8.

📒 Files selected for processing (4)
  • c/tests/cluster/gmm_c.cu
  • cpp/src/cluster/gmm_impl.cuh
  • fern/pages/python_api/python-api-cluster-gmm.md
  • python/cuvs/cuvs/cluster/gmm/gmm.pyx
✅ Files skipped from review due to trivial changes (1)
  • fern/pages/python_api/python-api-cluster-gmm.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • c/tests/cluster/gmm_c.cu
  • python/cuvs/cuvs/cluster/gmm/gmm.pyx
  • cpp/src/cluster/gmm_impl.cuh

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added Gaussian Mixture Model (GMM) clustering for C, C++, and Python.
    • Provides fit, predict, predict_proba, and score_samples.
    • Supports covariance types: full, tied, diagonal, and spherical.
    • Supports initialization methods: k-means, k-means++, random, and random-from-data, plus warm-start for continued EM runs.
  • Documentation
    • Added guides and API reference pages for C/C++/Python GMM.
  • Tests
    • Added C/C++/Python test coverage for correctness, convergence, and output normalization/probability behavior.

Walkthrough

This PR adds Gaussian Mixture Model support across the C++ core, C and Python bindings, tests, and Fern documentation.

Changes

GMM Clustering Feature

Layer / File(s) Summary
C++ public API and build wiring
cpp/include/cuvs/cluster/gmm.hpp, cpp/CMakeLists.txt
Defines the public GMM enums, params, and templated API declarations, and adds the GMM CUDA sources to the object library.
CUDA kernels and C++ EM implementation
cpp/src/cluster/gmm_kernels.cuh, cpp/src/cluster/gmm_impl.cuh, cpp/src/cluster/gmm.cuh
Implements the GMM kernels, EM training and inference logic, and public validation wrappers.
C++ explicit instantiations
cpp/src/cluster/gmm_float.cu, cpp/src/cluster/gmm_double.cu
Adds explicit float and double instantiations for the GMM API entry points.
C API header, implementation, and build wiring
c/include/cuvs/cluster/gmm.h, c/include/cuvs/core/all.h, c/src/cluster/gmm.cpp, c/CMakeLists.txt
Adds the C GMM API, DLPack-to-raft dispatch, and C build integration.
Python/Cython bindings and package wiring
python/cuvs/cuvs/cluster/gmm/*, python/cuvs/cuvs/cluster/CMakeLists.txt, python/cuvs/cuvs/cluster/__init__.py
Adds the Python GMM module, Cython declarations, package exports, and build wiring.
Tests
cpp/tests/cluster/gmm.cu, cpp/tests/CMakeLists.txt, c/tests/cluster/gmm_c.cu, c/tests/CMakeLists.txt, python/cuvs/cuvs/tests/test_gmm.py
Adds C++, C, and Python tests for fit, predict, responsibilities, score samples, initialization, warm start, and covariance failures.
Fern documentation
fern/docs.yml, fern/pages/cluster/gmm.md, fern/pages/c_api/*, fern/pages/cpp_api/*, fern/pages/python_api/*
Adds the GMM guide, API reference pages, and navigation entries.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

feature request, C++, cpp

Suggested reviewers

  • cjnolet
  • tarang-jain
  • lowener
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding Gaussian Mixture Models support.
Description check ✅ Passed The description is related to the changeset and accurately mentions the GMM CUDA-core implementation and memory improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 markdownlint-cli2 (0.22.1)
fern/pages/python_api/python-api-cluster-gmm.md

markdownlint-cli2 v0.22.1 (markdownlint v0.40.0)
Error: Unable to use configuration file '/coderabbit-0.markdownlint-cli2.jsonc'; ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at throwForConfigurationFile (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:48:9)
at readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:169:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
[cause]: Error: ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at async open (node:internal/fs/promises:640:25)
at async Object.readFile (node:internal/fs/promises:1287:14)
at async readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:141:17)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/coderabbit-0.markdownlint-cli2.jsonc'
}
}


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@c/tests/cluster/gmm_c.cu`:
- Around line 45-64: The cuvsResources_t resource created by
cuvsResourcesCreate(&res) uses a different CUDA stream by default than the
stream used for the asynchronous raft::copy operations, creating a race
condition where cuvsGMMFit executes before the data copy completes. After
calling cuvsResourcesCreate(&res), immediately call cuvsStreamSet(res, stream)
to synchronize the resource's stream with the stream variable that is used by
raft::copy and the subsequent GPU operations. Apply this fix immediately
following the cuvsResourcesCreate call, and also apply the same fix pattern to
the other test sections mentioned in the comment (lines 98-111, 167-187,
216-229).

In `@cpp/src/cluster/gmm_impl.cuh`:
- Around line 1115-1157: The FULL-covariance second pass has inconsistent
responsibilities across its computational steps. The issue is that after
m_finalize mutates weights and means at line 1141, the subsequent estep_tile
call at line 1145 recomputes responsibilities with these updated parameters, but
then m_cov_full_pass uses these new responsibilities which differ from those
used in the first pass and in m_accumulate, breaking the mathematical
consistency of the EM algorithm. Fix this by saving the responsibilities from
the first pass before m_finalize is called, and then reusing those saved
responsibilities in the m_cov_full_pass loop instead of recomputing them with
estep_tile.

In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx`:
- Around line 90-94: The `__cinit__` method in the GMM class does not check the
return value from `cuvsGMMParamsCreate`, which can leave `self.params`
uninitialized if the call fails, leading to undefined behavior. Add error
checking after the `cuvsGMMParamsCreate` call to verify it succeeded and raise
an exception if it fails. Additionally, the `__dealloc__` method currently calls
`check_cuvs()` on `cuvsGMMParamsDestroy`, which can raise exceptions in a
destructor (problematic behavior). Instead, check the return value from
`cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or
handle the error silently to avoid suppressed exceptions or ResourceWarnings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca5bb572-978f-46af-87e8-00b27bd7a8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 9ae6f93 and d4175d9.

📒 Files selected for processing (32)
  • c/CMakeLists.txt
  • c/include/cuvs/cluster/gmm.h
  • c/include/cuvs/core/all.h
  • c/src/cluster/gmm.cpp
  • c/tests/CMakeLists.txt
  • c/tests/cluster/gmm_c.cu
  • cpp/CMakeLists.txt
  • cpp/include/cuvs/cluster/gmm.hpp
  • cpp/src/cluster/gmm.cuh
  • cpp/src/cluster/gmm_double.cu
  • cpp/src/cluster/gmm_float.cu
  • cpp/src/cluster/gmm_impl.cuh
  • cpp/src/cluster/gmm_kernels.cuh
  • cpp/tests/CMakeLists.txt
  • cpp/tests/cluster/gmm.cu
  • fern/docs.yml
  • fern/pages/c_api/c-api-cluster-gmm.md
  • fern/pages/c_api/index.md
  • fern/pages/cluster/gmm.md
  • fern/pages/cluster/index.md
  • fern/pages/cpp_api/cpp-api-cluster-gmm.md
  • fern/pages/cpp_api/index.md
  • fern/pages/python_api/index.md
  • fern/pages/python_api/python-api-cluster-gmm.md
  • python/cuvs/cuvs/cluster/CMakeLists.txt
  • python/cuvs/cuvs/cluster/__init__.py
  • python/cuvs/cuvs/cluster/gmm/CMakeLists.txt
  • python/cuvs/cuvs/cluster/gmm/__init__.pxd
  • python/cuvs/cuvs/cluster/gmm/__init__.py
  • python/cuvs/cuvs/cluster/gmm/gmm.pxd
  • python/cuvs/cuvs/cluster/gmm/gmm.pyx
  • python/cuvs/cuvs/tests/test_gmm.py

Comment thread c/tests/cluster/gmm_c.cu
Comment thread cpp/src/cluster/gmm_impl.cuh Outdated
Comment thread python/cuvs/cuvs/cluster/gmm/gmm.pyx Outdated
Comment on lines +90 to +94
def __cinit__(self):
cuvsGMMParamsCreate(&self.params)

def __dealloc__(self):
check_cuvs(cuvsGMMParamsDestroy(self.params))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CRITICAL: Unchecked error in __cinit__ and unsafe exception in __dealloc__

Two related issues in the lifecycle methods:

  1. Line 91: cuvsGMMParamsCreate return value is not checked. If allocation fails, self.params remains uninitialized (undefined behavior), and subsequent operations will silently corrupt memory or crash.

  2. Line 94: check_cuvs() can raise exceptions in __dealloc__, which is problematic:

    • If __cinit__ failed silently, self.params may be invalid/garbage
    • Raising exceptions in destructors can cause suppressed errors or ResourceWarnings
Proposed fix
     def __cinit__(self):
-        cuvsGMMParamsCreate(&self.params)
+        check_cuvs(cuvsGMMParamsCreate(&self.params))
 
     def __dealloc__(self):
-        check_cuvs(cuvsGMMParamsDestroy(self.params))
+        if self.params is not NULL:
+            cuvsGMMParamsDestroy(self.params)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs/cuvs/cluster/gmm/gmm.pyx` around lines 90 - 94, The `__cinit__`
method in the GMM class does not check the return value from
`cuvsGMMParamsCreate`, which can leave `self.params` uninitialized if the call
fails, leading to undefined behavior. Add error checking after the
`cuvsGMMParamsCreate` call to verify it succeeded and raise an exception if it
fails. Additionally, the `__dealloc__` method currently calls `check_cuvs()` on
`cuvsGMMParamsDestroy`, which can raise exceptions in a destructor (problematic
behavior). Instead, check the return value from
`cuvsGMMParamsDestroy(self.params)` without raising exceptions - only log or
handle the error silently to avoid suppressed exceptions or ResourceWarnings.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 23, 2026
@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing Jun 23, 2026
@cjnolet

cjnolet commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

/ok to test d4175d9

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants