Add blog on agent-assisted SGLang development#353
Open
BBuf wants to merge 13 commits into
Open
Conversation
mickqian
reviewed
Jun 21, 2026
| type: blog | ||
| --- | ||
|
|
||
| SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops. |
Collaborator
There was a problem hiding this comment.
nit: LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, torch.compile, ModelOpt quantization, video model parallelism seems a bit excessive
|
|
||
| SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops. | ||
|
|
||
| Around SGLang Agent development, a set of skills has already emerged for both LLM and diffusion work. [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence. |
Collaborator
There was a problem hiding this comment.
perhaps this is better:
- BBuf/AI-Infra-Auto-Driven-SKILLS covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops
- BBuf/KDA-Pilot explores automated optimization for SGLang diffusion kernels
mickqian
reviewed
Jun 21, 2026
| 3. Interpret NCU results according to the kernel's compute characteristics. | ||
| For memory-bound kernels, focus on DRAM/L2 throughput, load/store efficiency, and memory pipe utilization. For compute-bound GEMM/attention kernels, focus on Tensor Core utilization, SM busy, eligible warps, and the main stall reasons. For small latency-bound kernels, check launch count, per-kernel duration, synchronization points, and possible fusion opportunities. A single trace screenshot is not enough; the next code change should be supported by specific metrics. | ||
|
|
||
| 4. For diffusion, first confirm there is no fallback. |
Collaborator
There was a problem hiding this comment.
seems a bit specific to me 😂
mickqian
approved these changes
Jun 22, 2026
ispobock
reviewed
Jun 22, 2026
Lyken17
reviewed
Jun 22, 2026
|
|
||
| - [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. | ||
| - [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. | ||
|
|
There was a problem hiding this comment.
- mit-han-lab/KDA: the winner solution for MLSys 2026 FlashInfer Kernel Contest.
Lyken17
reviewed
Jun 22, 2026
| - [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. | ||
| - [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. | ||
|
|
||
| Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence. |
There was a problem hiding this comment.
so far, KDA-Pilot has optimized XX operators and XX of them has been merged into SGLang.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
public/images/blog/agent-assisted-sglang-development/and reference them through site-local/images/blog/...pathsagent, splitting repository links into bullets, broadening the backend/fallback profile rule, and reducing top-level sectionsValidation
npm run buildsuccessfullyhttp://localhost:3010/blog/2026-06-20-agent-assisted-sglang-development/src/components/Tags.jsis no longer changed in the PR diff