Add blog on agent-assisted SGLang development by BBuf · Pull Request #353 · lm-sys/lm-sys.github.io

BBuf · 2026-06-20T04:07:02Z

Summary

Add a blog post on agent-assisted SGLang development
Cover SGLang agent skills, Humanize/RLCR, SGLang SOTA Performance Loop, Codex Goal, and KDA-Pilot kernel optimization evidence
Store the blog figures under public/images/blog/agent-assisted-sglang-development/ and reference them through site-local /images/blog/... paths
Address review feedback by shortening the opening scope list, lowercasing agent, splitting repository links into bullets, broadening the backend/fallback profile rule, and reducing top-level sections

Validation

Ran npm run build successfully
Checked local preview at http://localhost:3010/blog/2026-06-20-agent-assisted-sglang-development/
Confirmed the blog markdown has no Chinese text and no local filesystem paths
Confirmed src/components/Tags.js is no longer changed in the PR diff

mickqian · 2026-06-21T14:58:26Z

+type: blog
+---
+
+SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops.


nit: LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, torch.compile, ModelOpt quantization, video model parallelism seems a bit excessive

mickqian · 2026-06-21T14:59:12Z

+
+SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops.
+
+Around SGLang Agent development, a set of skills has already emerged for both LLM and diffusion work. [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence.


nit: Agent -> agent

perhaps this is better:

BBuf/AI-Infra-Auto-Driven-SKILLS covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops

BBuf/KDA-Pilot explores automated optimization for SGLang diffusion kernels

mickqian · 2026-06-21T15:06:05Z

+3. Interpret NCU results according to the kernel's compute characteristics.
+For memory-bound kernels, focus on DRAM/L2 throughput, load/store efficiency, and memory pipe utilization. For compute-bound GEMM/attention kernels, focus on Tensor Core utilization, SM busy, eligible warps, and the main stall reasons. For small latency-bound kernels, check launch count, per-kernel duration, synchronization points, and possible fusion opportunities. A single trace screenshot is not enough; the next code change should be supported by specific metrics.
+
+4. For diffusion, first confirm there is no fallback.


seems a bit specific to me 😂

Lyken17 · 2026-06-22T14:35:56Z


 - [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops.
 - [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels.



mit-han-lab/KDA: the winner solution for MLSys 2026 FlashInfer Kernel Contest.

Lyken17 · 2026-06-22T14:36:33Z

 - [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops.
 - [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels.

 Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence.


so far, KDA-Pilot has optimized XX operators and XX of them has been merged into SGLang.

Lyken17

1

BBuf added 2 commits June 20, 2026 12:05

Add agent-assisted SGLang development blog

e2998e7

Update agent blog preview image

e20cbb9

mickqian reviewed Jun 21, 2026

View reviewed changes

BBuf added 2 commits June 21, 2026 23:45

Address agent blog review comments

04d5568

Store agent blog images locally

e12dc61

mickqian approved these changes Jun 22, 2026

View reviewed changes

ispobock reviewed Jun 22, 2026

View reviewed changes

Comment thread blog/2026-06-20-agent-assisted-sglang-development.md

Comment thread blog/2026-06-20-agent-assisted-sglang-development.md Outdated

BBuf added 2 commits June 22, 2026 21:49

Add KDA workflow context to agent blog

3c02c38

Remove unsupported model-specific claims

5969946

Lyken17 reviewed Jun 22, 2026

View reviewed changes

BBuf added 7 commits June 22, 2026 22:45

Clarify KDA project context

25fbd96

Refine KDA kernel evidence in SGLang agent blog

c197ae8

Remove preliminary LLM kernel claims

86aac77

Tighten qknorm-rope evidence wording

418beb0

Add SGLang agent blog cover image

fc0b156

Refine SGLang skill references

8c47037

Update KDA acknowledgments

6094a32


		SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops.

		Around SGLang Agent development, a set of skills has already emerged for both LLM and diffusion work. [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence.


		- [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops.
		- [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels.

Conversation

BBuf commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Lyken17 Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lyken17 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BBuf commented Jun 20, 2026 •

edited

Loading

Lyken17 Jun 22, 2026 •

edited

Loading