Skip to content

Add blog on agent-assisted SGLang development#353

Open
BBuf wants to merge 13 commits into
lm-sys:mainfrom
BBuf:blog/agent-assisted-sglang-development
Open

Add blog on agent-assisted SGLang development#353
BBuf wants to merge 13 commits into
lm-sys:mainfrom
BBuf:blog/agent-assisted-sglang-development

Conversation

@BBuf

@BBuf BBuf commented Jun 20, 2026

Copy link
Copy Markdown

Summary

  • Add a blog post on agent-assisted SGLang development
  • Cover SGLang agent skills, Humanize/RLCR, SGLang SOTA Performance Loop, Codex Goal, and KDA-Pilot kernel optimization evidence
  • Store the blog figures under public/images/blog/agent-assisted-sglang-development/ and reference them through site-local /images/blog/... paths
  • Address review feedback by shortening the opening scope list, lowercasing agent, splitting repository links into bullets, broadening the backend/fallback profile rule, and reducing top-level sections

Validation

  • Ran npm run build successfully
  • Checked local preview at http://localhost:3010/blog/2026-06-20-agent-assisted-sglang-development/
  • Confirmed the blog markdown has no Chinese text and no local filesystem paths
  • Confirmed src/components/Tags.js is no longer changed in the PR diff

type: blog
---

SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, torch.compile, ModelOpt quantization, video model parallelism seems a bit excessive

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


SGLang development increasingly goes beyond isolated code changes. The same repository now spans LLM serving, multi-node runtime, attention/MoE/quantization kernels, diffusion pipelines, `torch.compile`, ModelOpt quantization, video model parallelism, and production incident handling. In the past, many of these workflows depended on individual developer memory: how to launch a certain model, how to read a profile trace, which log to add first when debugging a CUDA crash, or which benchmarks a performance PR should include. As agent tools mature, this experience can be turned into executable `SKILL.md` files, scripts, benchmark contracts, and review loops.

Around SGLang Agent development, a set of skills has already emerged for both LLM and diffusion work. [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops. [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels. Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Agent -> agent

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this is better:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

3. Interpret NCU results according to the kernel's compute characteristics.
For memory-bound kernels, focus on DRAM/L2 throughput, load/store efficiency, and memory pipe utilization. For compute-bound GEMM/attention kernels, focus on Tensor Core utilization, SM busy, eligible warps, and the main stall reasons. For small latency-bound kernels, check launch count, per-kernel duration, synchronization points, and possible fusion opportunities. A single trace screenshot is not enough; the next code change should be supported by specific metrics.

4. For diffusion, first confirm there is no fallback.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a bit specific to me 😂

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment thread blog/2026-06-20-agent-assisted-sglang-development.md
Comment thread blog/2026-06-20-agent-assisted-sglang-development.md Outdated

- [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops.
- [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels.

@Lyken17 Lyken17 Jun 22, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • mit-han-lab/KDA: the winner solution for MLSys 2026 FlashInfer Kernel Contest.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- [BBuf/AI-Infra-Auto-Driven-SKILLS](https://github.com/BBuf/AI-Infra-Auto-Driven-SKILLS) covers workflows such as serving benchmarks, profile analysis, production incident triage, and SOTA loops.
- [BBuf/KDA-Pilot](https://github.com/BBuf/KDA-Pilot) explores automated optimization for SGLang diffusion kernels.

Viewed together, these efforts point to the same direction: the value of agents comes from procedural engineering knowledge, including executable steps, reproducible experiments, and reviewable evidence.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far, KDA-Pilot has optimized XX operators and XX of them has been merged into SGLang.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Lyken17 Lyken17 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants