Clamp max_iter in OOC KMeans test#2256
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughIn ChangesBatched KMeans Test Iteration Cap
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/cluster/detail/kmeans_common.cuh`:
- Around line 541-603: The diagnostic fingerprinting code in
compute_centroid_adjustments performs expensive host-device memory copies,
stream synchronization, and hashing operations unconditionally in the KMeans hot
path. Wrap both the input diagnostic block (containing the static s_call_id
counter, mat_labels device buffer creation, host vector copies via raft::copy,
and the fnv/fmt_head lambdas with RAFT_LOG_INFO call) and the corresponding
output diagnostic block (mentioned as also applying to lines 626-666) behind a
compile-time or runtime gate macro/flag so these operations only execute when
explicitly enabled for debugging, not during normal operation.
In `@cpp/src/cluster/detail/minClusterDistanceCompute.cu`:
- Around line 151-208: The self-test block starting with s_self_test_done and
the entire diagnostic block containing the call_id, tensor copying, hash
computation via fnv_fp, and RAFT_LOG_INFO call are expensive operations that
should not run in the default hot path. Gate both the
runFusedNNDeterminismSelfTest invocation block and the diagnostic fingerprinting
block (including the h_X, h_c, h_norm allocations and device-to-host copies)
behind a compile-time feature flag or runtime environment variable that is
disabled by default, so these diagnostics only execute when explicitly enabled
for debugging purposes.
In `@cpp/tests/cluster/kmeans.cu`:
- Around line 384-386: The max_iter parameter is being set to 3 to limit
iterations for consistency between reference and batched code paths, but this
value is later overridden to 20 at line 409, making the initial cap ineffective.
Remove or comment out the later params.max_iter = 20 assignment that occurs
after the max_iter = 3 setting so that the iteration limit remains at 3
throughout the test execution and the flakiness mitigation takes effect as
intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3577ffe9-d6da-42d2-a562-20698b05d1d4
📒 Files selected for processing (3)
cpp/src/cluster/detail/kmeans_common.cuhcpp/src/cluster/detail/minClusterDistanceCompute.cucpp/tests/cluster/kmeans.cu
| params.oversampling_factor = 0; | ||
| // Limit the number of iterations to ensure same number of iterations for reference and batched | ||
| // code paths. | ||
| params.max_iter = 3; |
There was a problem hiding this comment.
This fix doesn't seem particularly robust, and appears to me like it'll reduce the coverage for the tests, right? Default max_iters for k-means is much higher. How do we test in a way where we can validate its ability to stop early without having to reduce the max-iters to the point where we are essentially not testing the early stopping at all?
There was a problem hiding this comment.
I think the point of the test when we are bit matching centroids is to ensure that the accumulation of partial sums in the OOC setting in a single iteration matches the in-memory setting. The test fails when there is an edge case -- the variation due to floating point precision in the arithmetic is just enough to push the difference in centroids between consecutive iterations to fall below the convergence criteria -- in-memory converges one iteration sooner in than OOC.
Addresses the KMeans flaky test in #2100. Since we bit match centroids, we limit the number of iterations to ensure that both the OOC and in-memory run the same number of iters. The flaky test is due to numerical differences arising out of atomic operations (perhaps due to different ordering or adding several batch sums in the OOC case versus a single atomic op in the in-memory case) --
raft::reduce_rows_by_key.