Skip to content

Introduce fused GEMM + 1-NN primitive using cuTile#2249

Draft
divyegala wants to merge 5 commits into
NVIDIA:mainfrom
divyegala:cutile-python-to-cpp
Draft

Introduce fused GEMM + 1-NN primitive using cuTile#2249
divyegala wants to merge 5 commits into
NVIDIA:mainfrom
divyegala:cutile-python-to-cpp

Conversation

@divyegala

@divyegala divyegala commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This PR adds infrastructure built on top of existing JIT LTO architecture to generate kernels using cutile-python at build time, and embed them in the C++ library to make them callable from C++.

@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@divyegala divyegala changed the title cuTile Python to CPP embedding example Introduce fused GEMM + 1-NN primitive using cuTile Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant