Skip to content

change: routing logic change for Hugging Face DLCs#5960

Open
dwarez wants to merge 4 commits into
aws:masterfrom
huggingface:hf_routing_refactor
Open

change: routing logic change for Hugging Face DLCs#5960
dwarez wants to merge 4 commits into
aws:masterfrom
huggingface:hf_routing_refactor

Conversation

@dwarez

@dwarez dwarez commented Jun 18, 2026

Copy link
Copy Markdown

Depends on #5957.

ModelBuilder's auto-detection will now select:

  • huggingface-vllm for text-generation
  • huggingface-vllm-omni for multimodal tasks
  • huggingface-sglang is opt-in on user side
  • tei logic unchanged

@alvarobartt alvarobartt left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, only nit is in the model_task check for Text Embeddings Inference which is missing text-ranking task, see https://huggingface.co/models?pipeline_tag=text-ranking&other=text-embeddings-inference&sort=trending

Comment thread sagemaker-serve/src/sagemaker/serve/model_builder.py Outdated
dwarez and others added 4 commits June 19, 2026 09:10
huggingface-pytorch-inference images

add: huggingface-vllm, huggingface-sglang, huggingface-vllm-omni
families metadata

Signed-off-by: DWarez <dario.salvati@huggingface.co>
…elBuilder

Add ModelServer.VLLM/SGLANG/VLLM_OMNI and teach ModelBuilder's auto-detection
to select the new HuggingFace DLCs: text-generation now defaults to vLLM
(replacing archived TGI), multimodal tasks route to vLLM-omni, and SGLang is
reachable via explicit model_server. TEI/transformers routing is unchanged.

Signed-off-by: DWarez <dario.salvati@huggingface.co>
by `image-text-to-text`

Signed-off-by: DWarez <dario.salvati@huggingface.co>
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
@dwarez dwarez force-pushed the hf_routing_refactor branch from ed73c0d to 494d2c2 Compare June 19, 2026 08:28
@dwarez dwarez requested a deployment to manual-approval June 19, 2026 08:28 — with GitHub Actions Waiting
@dwarez dwarez requested a deployment to manual-approval June 19, 2026 08:28 — with GitHub Actions Waiting

@Mattral Mattral left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this refactor. Having text‑generation, multimodal, and opt‑in SGLang routing handled automatically should reduce confusion and help users get the right DLC without extra setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants