fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types by lxfight · Pull Request #9000 · AstrBotDevs/AstrBot

lxfight · 2026-06-25T06:05:38Z

Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.

The knowledge base CRUD (create/update/list) API contract is inconsistent between frontend and backend. The backend schema uses a name field while the generated frontend types expect kb_name. Both create and update share the same KnowledgeBaseRequest schema, but create requires kb_name and embedding_provider_id, while update does not — this shared schema leads to confusing validation semantics. Additionally, the knowledge base payload is missing canonical fields such as emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, and top_m_final.

This change unifies the frontend-backend contract: renames name to kb_name, introduces KnowledgeBaseCreateRequest to separate create and update semantics, promotes optional configuration parameters into canonical KnowledgeBaseRequest fields, and syncs the OpenAPI spec, backend schemas, and generated frontend types.

Modifications / 改动点

OpenAPI schema (openspec/openapi-v1.yaml):
- KnowledgeBaseRequest: name → kb_name; added emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, top_m_final fields; embedding_provider_id / rerank_provider_id now accept null
- Added KnowledgeBaseCreateRequest inheriting KnowledgeBaseRequest via allOf, marking kb_name and embedding_provider_id as required
Backend schema (astrbot/dashboard/schemas.py): synced KnowledgeBaseRequest and KnowledgeBaseCreateRequest Pydantic models
Backend API (astrbot/dashboard/api/knowledge_bases.py): create_knowledge_base uses KnowledgeBaseCreateRequest; update_knowledge_base and list_knowledge_bases adapted to kb_name
Backend service (astrbot/dashboard/services/knowledge_base_service.py): adapted to new schema field names and types
Frontend types (dashboard/src/api/generated/openapi-v1/types.gen.ts): regenerated
Frontend API (dashboard/src/api/v1.ts): adapted to new types
Frontend page (dashboard/src/views/knowledge-base/KBList.vue): name → kb_name
Tests (tests/unit/test_knowledge_base_service_contract.py): added 266 lines of end-to-end contract tests
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_applies_pagination PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_without_page_size_returns_all_items PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_preserves_unpaginated_default PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_uses_default_page_size_when_page_is_explicit PASSED
tests/unit/test_knowledge_base_service_contract.py::test_create_kb_accepts_legacy_name_field PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_preserves_omitted_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_allows_explicit_rerank_provider_clear PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_schemas_match_service_contract PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_preserves_explicit_null_updates PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_omits_unset_none_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_uses_legacy_name_as_input_alias PASSED

============================== 11 passed in 1.60s ==============================

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Align knowledge base CRUD contracts across backend, OpenAPI spec, and frontend, including canonical field names, pagination behavior, and configuration parameters.

New Features:

Support optional pagination with total count in knowledge base list responses while preserving an unpaginated default mode.
Expose canonical knowledge base configuration fields such as emoji and retrieval tuning parameters in the public API and dashboard.

Bug Fixes:

Normalize knowledge base payloads to consistently use kb_name while preserving compatibility with legacy name input.
Ensure knowledge base update requests preserve existing values for omitted fields while allowing explicit null updates for optional providers.

Enhancements:

Split knowledge base create and update request schemas to accurately reflect required fields and type nullability.
Add a canonical_payload helper on the knowledge base request model to generate service-facing payloads from API inputs.
Strengthen the knowledge base API service tests with contract coverage for schemas, pagination semantics, and legacy compatibility.

Tests:

Add unit tests covering knowledge base pagination semantics, legacy name handling, canonical payload generation, and update behavior.

Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The legacy name → kb_name normalization now exists both in KnowledgeBaseService._canonical_kb_payload and KnowledgeBaseRequest.canonical_payload; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
Instead of manually handling the legacy name field via getattr(self, "name", None) and payload.get("name"), you could model it explicitly in the Pydantic schema with a Field(alias="name") or similar, which would make the legacy compatibility more self-documenting and less error-prone.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The legacy `name` → `kb_name` normalization now exists both in `KnowledgeBaseService._canonical_kb_payload` and `KnowledgeBaseRequest.canonical_payload`; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
- Instead of manually handling the legacy `name` field via `getattr(self, "name", None)` and `payload.get("name")`, you could model it explicitly in the Pydantic schema with a `Field(alias="name")` or similar, which would make the legacy compatibility more self-documenting and less error-prone.

## Individual Comments

### Comment 1
<location path="tests/unit/test_knowledge_base_service_contract.py" line_range="187" />
<code_context>
+@pytest.mark.asyncio
+async def test_create_kb_accepts_legacy_name_field():
</code_context>
<issue_to_address>
**suggestion (testing):** Extend create_kb tests to cover validation errors (missing names / embedding provider).

To round this out, please also add tests for validation failures:

- When both `kb_name` and `name` are missing/empty, `create_kb` should raise `KnowledgeBaseServiceError("知识库名称不能为空")`.
- When `embedding_provider_id` is missing or invalid (provider manager returns `None`), the service should raise the current error used in that case.

These will ensure the service enforces the new create semantics and catches regressions in validation behavior.
</issue_to_address>

### Comment 2
<location path="astrbot/dashboard/services/knowledge_base_service.py" line_range="32" />
<code_context>
     def _payload(data: object) -> dict[str, Any]:
         return data if isinstance(data, dict) else {}

+    @staticmethod
+    def _canonical_kb_payload(data: object) -> dict[str, Any]:
+        """Normalize knowledge base create/update payloads.
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing request canonicalization in the schema and making `update_kb`’s update fields explicit to avoid indirection and duplicated normalization logic.

You can simplify both of the flagged areas without losing any of the new behavior.

### 1. Avoid duplicating canonicalization logic

Instead of maintaining `_canonical_kb_payload` and a separate `canonical_payload` on the schema, centralize the logic in the schema and call it here.

For example, if you already have `KnowledgeBaseRequest.canonical_payload()`:

```python
# service.py
from .schemas import KnowledgeBaseRequest  # adjust import

@staticmethod
def _canonical_kb_payload(data: object) -> dict[str, Any]:
    raw = KnowledgeBaseService._payload(data)
    req = KnowledgeBaseRequest(**raw)
    return req.canonical_payload()
```

Or, if you can call the schema directly where you currently use `_canonical_kb_payload`, remove the helper entirely and keep the service dealing only with canonical dicts:

```python
async def create_kb(self, data: object) -> tuple[dict[str, Any], str]:
    kb_manager = self.get_kb_manager()
    raw = self._payload(data)
    payload = KnowledgeBaseRequest(**raw).canonical_payload()
    ...
```

This way the `name ➜ kb_name` migration and any future normalization lives in one place.

---

### 2. Simplify `update_kb` without losing partial-update semantics

You can keep the “at least one field” validation and the “default to current values” behavior, but make the update fields explicit instead of flowing through `update_keys` / `update_data` / `**update_data`.

Current logic (simplified):

```python
update_keys = [...]
provided_updates = {key: payload[key] for key in update_keys if key in payload}
...
current = current_kb.kb
update_data = {key: getattr(current, key) for key in update_keys}
update_data.update(provided_updates)

kb_helper = await self.get_kb_manager().update_kb(
    kb_id=kb_id,
    **update_data,
)
```

You can replace it with explicit parameters:

```python
async def update_kb(self, data: object) -> tuple[dict[str, Any], str]:
    payload = self._canonical_kb_payload(data)
    kb_id = payload.get("kb_id")
    if not kb_id:
        raise KnowledgeBaseServiceError("缺少参数 kb_id")

    # Allowed update fields
    fields = (
        "kb_name",
        "description",
        "emoji",
        "embedding_provider_id",
        "rerank_provider_id",
        "chunk_size",
        "chunk_overlap",
        "top_k_dense",
        "top_k_sparse",
        "top_m_final",
    )
    if not any(field in payload for field in fields):
        raise KnowledgeBaseServiceError("至少需要提供一个更新字段")

    current_kb = await self.get_kb_manager().get_kb(kb_id)
    if not current_kb:
        raise KnowledgeBaseServiceError("知识库不存在")
    current = current_kb.kb

    kb_helper = await self.get_kb_manager().update_kb(
        kb_id=kb_id,
        kb_name=payload.get("kb_name", current.kb_name),
        description=payload.get("description", current.description),
        emoji=payload.get("emoji", current.emoji),
        embedding_provider_id=payload.get("embedding_provider_id", current.embedding_provider_id),
        rerank_provider_id=payload.get("rerank_provider_id", current.rerank_provider_id),
        chunk_size=payload.get("chunk_size", current.chunk_size),
        chunk_overlap=payload.get("chunk_overlap", current.chunk_overlap),
        top_k_dense=payload.get("top_k_dense", current.top_k_dense),
        top_k_sparse=payload.get("top_k_sparse", current.top_k_sparse),
        top_m_final=payload.get("top_m_final", current.top_m_final),
    )
    if not kb_helper:
        raise KnowledgeBaseServiceError("知识库不存在")
    return kb_helper.kb.model_dump(), "更新知识库成功"
```

This keeps:

- Partial updates (only provided fields overwrite).
- Defaults from current KB values for missing fields.
- The “at least one update field” requirement.

But it removes the extra dict-building and makes the updatable fields obvious at the call site.
</issue_to_address>

### Comment 3
<location path="astrbot/dashboard/schemas.py" line_range="207" />
<code_context>


 class KnowledgeBaseRequest(OpenModel):
-    kb_id: str | None = None
-    name: str | None = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing knowledge base payload normalization in `KnowledgeBaseRequest` using Pydantic field aliases for the legacy `name` field to avoid custom migration logic and duplicated canonicalization.

You can reduce complexity here by (1) making `KnowledgeBaseRequest` the single source of truth for normalization, and (2) using Pydantic aliases to handle the legacy `name` field instead of custom logic.

### 1. Remove duplicate canonicalization in the service

Instead of having both `canonical_payload()` on the model and `_canonical_kb_payload()` in the service, keep only the model-level one and call it from the service:

```python
# in the service, replace `_canonical_kb_payload(...)` usage with:
payload = kb_request.canonical_payload()
# ...use `payload` directly
```

Then delete `_canonical_kb_payload` entirely. This way, any future changes to the KB payload shape happen in one place.

### 2. Use a field alias for the legacy `name` input

You can avoid manual `getattr(self, "name", None)` and keep the migration logic declarative:

```python
from pydantic import Field

class KnowledgeBaseRequest(OpenModel):
    kb_name: str | None = Field(None, alias="name")  # accept legacy "name"
    description: str | None = None
    emoji: str | None = None
    embedding_provider_id: str | None = None
    rerank_provider_id: str | None = None
    chunk_size: int | None = None
    chunk_overlap: int | None = None
    top_k_dense: int | None = None
    top_k_sparse: int | None = None
    top_m_final: int | None = None

    def canonical_payload(self) -> dict[str, Any]:
        return self.model_dump(
            exclude_unset=True,
            include={
                "kb_name",
                "description",
                "emoji",
                "embedding_provider_id",
                "rerank_provider_id",
                "chunk_size",
                "chunk_overlap",
                "top_k_dense",
                "top_k_sparse",
                "top_m_final",
            },
            by_alias=False,  # ensure we output `kb_name`, not `name`
        )
```

This keeps all existing behavior (including support for legacy `name`) while:

- Eliminating duplicated normalization logic between layers.
- Removing the manual migration code path that can drift from the service logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-25T06:07:10Z

+        chunk_overlap=None,
+        top_k_dense=12,
+        top_k_sparse=8,
+        top_m_final=3,


suggestion (testing): Extend create_kb tests to cover validation errors (missing names / embedding provider).

To round this out, please also add tests for validation failures:

When both kb_name and name are missing/empty, create_kb should raise KnowledgeBaseServiceError("知识库名称不能为空").

When embedding_provider_id is missing or invalid (provider manager returns None), the service should raise the current error used in that case.

These will ensure the service enforces the new create semantics and catches regressions in validation behavior.

gemini-code-assist

Code Review

This pull request refactors the knowledge base management system by standardizing payload schemas, introducing pagination to the listing service, updating OpenAPI specifications, and adding unit tests. However, several critical issues were identified during the review: removing _model_dict will cause NameError exceptions in other endpoints that still reference it; the KnowledgeBaseCreateRequest schema is missing from schemas.py despite being expected by the frontend and OpenAPI spec; combining allOf with additionalProperties: false in the OpenAPI spec will cause validation failures; and using getattr without a default value when updating knowledge bases could lead to an AttributeError.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T06:07:56Z

-    return payload if isinstance(payload, dict) else {}
-
-
 async def _run(operation, *, prefix: str):


The helper function _model_dict was removed from this file. However, it is still referenced in multiple other endpoints within the same file:

import_knowledge_base_documents (line 210)

import_knowledge_base_document_url (line 224)

retrieve_knowledge_base (line 304)

Removing _model_dict will cause a NameError when any of these endpoints are called. Please restore _model_dict or refactor those endpoints to use payload.model_dump(exclude_none=True) directly.

def _model_dict(payload) -> dict[str, Any]: if payload is None: return {} if hasattr(payload, "model_dump"): return payload.model_dump(exclude_none=True) return payload if isinstance(payload, dict) else {} async def _run(operation, *, prefix: str):

gemini-code-assist · 2026-06-25T06:07:56Z


 class KnowledgeBaseImportRequest(OpenModel):


The PR description mentions syncing KnowledgeBaseCreateRequest Pydantic models, and the OpenAPI spec/frontend types expect it. However, KnowledgeBaseCreateRequest is not defined in astrbot/dashboard/schemas.py.

Please define KnowledgeBaseCreateRequest inheriting from KnowledgeBaseRequest with kb_name and embedding_provider_id as required fields.

Suggested change

class KnowledgeBaseImportRequest(OpenModel):

class KnowledgeBaseCreateRequest(KnowledgeBaseRequest):

kb_name: str

embedding_provider_id: str

class KnowledgeBaseImportRequest(OpenModel):

gemini-code-assist · 2026-06-25T06:07:56Z

+    KnowledgeBaseCreateRequest:
+      allOf:
+        - $ref: "#/components/schemas/KnowledgeBaseRequest"
+        - type: object
+          required: [kb_name, embedding_provider_id]
+          properties:
+            kb_name:
+              type: string
+            embedding_provider_id:
+              type: string
+          additionalProperties: false


In OpenAPI, combining allOf with additionalProperties: false on a sub-schema will cause validation to fail for any properties defined in the other allOf schemas (such as description, emoji, etc. from KnowledgeBaseRequest). This is because the validator evaluates the sub-schema strictly and rejects any property not explicitly listed in its own properties block.

Please remove additionalProperties: false from the inline schema of KnowledgeBaseCreateRequest.

KnowledgeBaseCreateRequest: allOf: - $ref: "#/components/schemas/KnowledgeBaseRequest" - type: object required: [kb_name, embedding_provider_id] properties: kb_name: type: string embedding_provider_id: type: string

gemini-code-assist · 2026-06-25T06:07:56Z

+        if not current_kb:
+            raise KnowledgeBaseServiceError("知识库不存在")
+        current = current_kb.kb
+        update_data = {key: getattr(current, key) for key in update_keys}


Using getattr(current, key) without a default value can raise an AttributeError if any of the update_keys are missing from the current object (e.g., due to database schema mismatches or pending migrations).

Using getattr(current, key, None) is safer and prevents potential runtime crashes.

Suggested change

update_data = {key: getattr(current, key) for key in update_keys}

update_data = {key: getattr(current, key, None) for key in update_keys}

…fix OpenAPI allOf validation - Replace 3 remaining _model_dict() calls with payload.model_dump(exclude_none=True) - Add KnowledgeBaseCreateRequest schema with required kb_name + embedding_provider_id - Remove additionalProperties: false from OpenAPI allOf sub-schema to avoid validation failures - Use Pydantic Field(alias="name") + populate_by_name=True for legacy name migration - Delegate _canonical_kb_payload to KnowledgeBaseRequest to eliminate duplicated logic - Add getattr default None for update_kb to prevent AttributeError - Add validation error tests: missing kb_name, missing embedding_provider_id

fix(kb): align CRUD API contract end-to-end

a1323d9

Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:webui The bug / feature is about webui(dashboard) of astrbot. feature:knowledge-base The bug / feature is about knowledge base labels Jun 25, 2026

sourcery-ai Bot reviewed Jun 25, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000
lxfight wants to merge 2 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-contract-crud

lxfight commented Jun 25, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return payload if isinstance(payload, dict) else {}


		async def _run(operation, *, prefix: str):

	update_data = {key: getattr(current, key) for key in update_keys}
	update_data = {key: getattr(current, key, None) for key in update_keys}

Uh oh!

Uh oh!

Conversation

lxfight commented Jun 25, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lxfight commented Jun 25, 2026 •

edited by sourcery-ai Bot

Loading