Skip to content

Add out-of-tree model plugin interface#441

Open
whjthu wants to merge 1 commit into
mainfrom
feature/out-of-tree-model-plugins
Open

Add out-of-tree model plugin interface#441
whjthu wants to merge 1 commit into
mainfrom
feature/out-of-tree-model-plugins

Conversation

@whjthu

@whjthu whjthu commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a lightweight out-of-tree model plugin API in python/infinilm/plugins/*, centered on ModelSpec, register_model, load_plugin, config adaptation, processor selection, and checkpoint weight remapping.
  • Wire plugin config adaptation into python/infinilm/infer_engine.py, so external HuggingFace model_type values can map to existing InfiniLM C++ backends without adding model definitions to the core repo.
  • Wire plugin weight remapping and processor selection into python/infinilm/modeling_utils.py and python/infinilm/processors/__init__.py, while preserving the existing built-in remapper and processor paths.
  • Add explicit out-of-tree C++ backend plugin loading through csrc/models/backend_plugin_loader.*, csrc/pybind11/bindings.cc, and python/infinilm/backend_plugins.py.
  • Make python/infinilm/__init__.py lighter by lazily importing heavier runtime objects, so plugin-only workflows can import infinilm.plugins without eagerly initializing the full runtime stack.
  • Update packaging in setup.py so newly added Python subpackages are included in installs.

Motivation

InfiniLM already supports multiple model families, but adding every new model-specific config adapter, processor choice, or checkpoint key conversion directly into the core engine repo does not scale well. Many model families are close enough to existing InfiniLM backends that they only need load-time metadata adaptation, not new inference kernels.

This PR introduces a small extension surface that lets external repositories define model integration logic out of tree:

  • map a new HuggingFace model_type to an existing InfiniLM backend;
  • adapt config.json fields before C++ engine initialization;
  • select an existing or plugin-owned processor;
  • remap checkpoint keys/tensors during weight loading;
  • optionally load a C++ backend plugin explicitly when a model cannot reuse a built-in backend.

The default behavior remains unchanged when no plugin is loaded: config adaptation is a no-op, plugin weight remapping is a no-op, and existing built-in processors/remappers continue to run as before.

This is not intended as a performance change. Python plugin callbacks run during config, processor, or weight loading only; they are not part of the token-by-token inference hot path.

Type of Change

  • feat — new feature / new model
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change

Test Results of Involved Models on Supported Platforms (Please attach screenshots)

Benchmark / Performance Impact

Notes for Reviewers


Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from main — the branch is rebased cleanly on top of the current main.
  • No fixup! / squash! / wip commits remain.
  • Existing PR/branch/commit that followed the legacy issue format.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes (if any) are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

  • Code follows the Google C++ Style Guide strictly.
  • Error and warning message wording follows the LLVM Coding Standards (CONTRIBUTING.md §C++).
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • No raw new/delete; RAII / smart pointers / existing allocators are used.
  • Changed files are formatted by scripts/format.py.
  • No changes/reference to csrc/models/llama_legacy/.

Python Specific (if Python files changed)

  • Code is PEP 8 compliant.
  • Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
  • Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
  • Changed files are formatted by scripts/format.py.
  • No changes/reference to python/infinilm/auto_config.py.

Testing

  • For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
  • Passed single request test (examples/test_infer.py), or specify the reason for skipping.
  • Passed offline performance test (examples/bench.py), or specify the reason for skipping.
  • Passed sanity test (test/bench/test_benchmark.py), or specify the reason for skipping.
  • Passed service test (python/infinilm/server/inference_server.py + scripts/test_perf.py), or specify the reason for skipping.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory on at least one affected platform.

Documentation

  • README.md, CONTRIBUTING.md, or inline docs updated when behavior, build flags, or developer workflow changed.
  • Any user-visible breaking change is called out explicitly under "Motivation" and in the commit/PR title with a ! or BREAKING CHANGE: footer.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • Third-party code is license-compatible and attributed.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@whjthu whjthu requested a review from a team June 17, 2026 07:49
@whjthu whjthu force-pushed the feature/out-of-tree-model-plugins branch from e9b8dda to 33730e6 Compare June 18, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant