Skip to content

feat(triton): add gemm operator#734

Draft
fuyou4546 wants to merge 1 commit into
InfiniTensor:masterfrom
fuyou4546:feat/triton-gemm
Draft

feat(triton): add gemm operator#734
fuyou4546 wants to merge 1 commit into
InfiniTensor:masterfrom
fuyou4546:feat/triton-gemm

Conversation

@fuyou4546

Copy link
Copy Markdown
Contributor

Summary

  • Add Gemm operator on Triton backend (src/triton/ops/gemm/build.py, gemm.py, gemm.h)

Motivation

Add a batched GEMM operator on the Triton backend, supporting fp16, bf16, and fp32 with alpha/beta scaling and optional transposition. The kernel uses a 2D tiled grid with GROUP_SIZE_M swizzle for L2 cache locality.

Closes N/A

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Smoke Test Result

pytest tests -m smoke -q
······
73 passed, 13 skipped, 22328 deselected in 2.76s

Test Results on Supported Platforms

Platform Affected Build / Smoke Result Full Result / Notes
NVIDIA Successfully installed InfiniOps-0.1.0 / 73 passed, 13 skipped, 22328 deselected 1500 passed, 7500 deselected
Iluvatar
MetaX
Cambricon
Moore
Ascend
Full `pytest` output (optional)
pytest tests/test_gemm.py -k "cuda-8"
======= test session starts =======
platform linux -- Python 3.12.0, pytest-9.0.3, pluggy-1.6.0
rootdir: /home/zhangshuo/projects/InfiniTensor/InfiniOps
configfile: pyproject.toml
plugins: xdist-3.8.0, cov-7.1.0
collected 9000 items / 7500 deselected / 1500 selected
······
1500 passed, 7500 deselected in 5.35s

Benchmark / Performance Impact

N/A

Notes for Reviewers

Depends on the AOT infrastructure in feat/triton-backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant