Skip to content

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000

Open
lxfight wants to merge 2 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-contract-crud
Open

fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000
lxfight wants to merge 2 commits into
AstrBotDevs:masterfrom
lxfight:fix/kb-contract-crud

Conversation

@lxfight

@lxfight lxfight commented Jun 25, 2026

Copy link
Copy Markdown
Member

Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.

The knowledge base CRUD (create/update/list) API contract is inconsistent between frontend and backend. The backend schema uses a name field while the generated frontend types expect kb_name. Both create and update share the same KnowledgeBaseRequest schema, but create requires kb_name and embedding_provider_id, while update does not — this shared schema leads to confusing validation semantics. Additionally, the knowledge base payload is missing canonical fields such as emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, and top_m_final.

This change unifies the frontend-backend contract: renames name to kb_name, introduces KnowledgeBaseCreateRequest to separate create and update semantics, promotes optional configuration parameters into canonical KnowledgeBaseRequest fields, and syncs the OpenAPI spec, backend schemas, and generated frontend types.

Modifications / 改动点

  • OpenAPI schema (openspec/openapi-v1.yaml):
    • KnowledgeBaseRequest: namekb_name; added emoji, chunk_size, chunk_overlap, top_k_dense, top_k_sparse, top_m_final fields; embedding_provider_id / rerank_provider_id now accept null
    • Added KnowledgeBaseCreateRequest inheriting KnowledgeBaseRequest via allOf, marking kb_name and embedding_provider_id as required
  • Backend schema (astrbot/dashboard/schemas.py): synced KnowledgeBaseRequest and KnowledgeBaseCreateRequest Pydantic models
  • Backend API (astrbot/dashboard/api/knowledge_bases.py): create_knowledge_base uses KnowledgeBaseCreateRequest; update_knowledge_base and list_knowledge_bases adapted to kb_name
  • Backend service (astrbot/dashboard/services/knowledge_base_service.py): adapted to new schema field names and types
  • Frontend types (dashboard/src/api/generated/openapi-v1/types.gen.ts): regenerated
  • Frontend API (dashboard/src/api/v1.ts): adapted to new types
  • Frontend page (dashboard/src/views/knowledge-base/KBList.vue): namekb_name
  • Tests (tests/unit/test_knowledge_base_service_contract.py): added 266 lines of end-to-end contract tests
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_applies_pagination PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_kbs_without_page_size_returns_all_items PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_preserves_unpaginated_default PASSED
tests/unit/test_knowledge_base_service_contract.py::test_list_route_uses_default_page_size_when_page_is_explicit PASSED
tests/unit/test_knowledge_base_service_contract.py::test_create_kb_accepts_legacy_name_field PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_preserves_omitted_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_update_kb_allows_explicit_rerank_provider_clear PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_schemas_match_service_contract PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_preserves_explicit_null_updates PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_omits_unset_none_fields PASSED
tests/unit/test_knowledge_base_service_contract.py::test_knowledge_base_request_uses_legacy_name_as_input_alias PASSED

============================== 11 passed in 1.60s ==============================

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Align knowledge base CRUD contracts across backend, OpenAPI spec, and frontend, including canonical field names, pagination behavior, and configuration parameters.

New Features:

  • Support optional pagination with total count in knowledge base list responses while preserving an unpaginated default mode.
  • Expose canonical knowledge base configuration fields such as emoji and retrieval tuning parameters in the public API and dashboard.

Bug Fixes:

  • Normalize knowledge base payloads to consistently use kb_name while preserving compatibility with legacy name input.
  • Ensure knowledge base update requests preserve existing values for omitted fields while allowing explicit null updates for optional providers.

Enhancements:

  • Split knowledge base create and update request schemas to accurately reflect required fields and type nullability.
  • Add a canonical_payload helper on the knowledge base request model to generate service-facing payloads from API inputs.
  • Strengthen the knowledge base API service tests with contract coverage for schemas, pagination semantics, and legacy compatibility.

Tests:

  • Add unit tests covering knowledge base pagination semantics, legacy name handling, canonical payload generation, and update behavior.

Unify knowledge base create/update/list with kb_name, canonical_payload,
optional list pagination with total, and matching OpenAPI plus dashboard types.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:webui The bug / feature is about webui(dashboard) of astrbot. feature:knowledge-base The bug / feature is about knowledge base labels Jun 25, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The legacy namekb_name normalization now exists both in KnowledgeBaseService._canonical_kb_payload and KnowledgeBaseRequest.canonical_payload; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
  • Instead of manually handling the legacy name field via getattr(self, "name", None) and payload.get("name"), you could model it explicitly in the Pydantic schema with a Field(alias="name") or similar, which would make the legacy compatibility more self-documenting and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The legacy `name``kb_name` normalization now exists both in `KnowledgeBaseService._canonical_kb_payload` and `KnowledgeBaseRequest.canonical_payload`; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
- Instead of manually handling the legacy `name` field via `getattr(self, "name", None)` and `payload.get("name")`, you could model it explicitly in the Pydantic schema with a `Field(alias="name")` or similar, which would make the legacy compatibility more self-documenting and less error-prone.

## Individual Comments

### Comment 1
<location path="tests/unit/test_knowledge_base_service_contract.py" line_range="187" />
<code_context>
+@pytest.mark.asyncio
+async def test_create_kb_accepts_legacy_name_field():
</code_context>
<issue_to_address>
**suggestion (testing):** Extend create_kb tests to cover validation errors (missing names / embedding provider).

To round this out, please also add tests for validation failures:

- When both `kb_name` and `name` are missing/empty, `create_kb` should raise `KnowledgeBaseServiceError("知识库名称不能为空")`.
- When `embedding_provider_id` is missing or invalid (provider manager returns `None`), the service should raise the current error used in that case.

These will ensure the service enforces the new create semantics and catches regressions in validation behavior.
</issue_to_address>

### Comment 2
<location path="astrbot/dashboard/services/knowledge_base_service.py" line_range="32" />
<code_context>
     def _payload(data: object) -> dict[str, Any]:
         return data if isinstance(data, dict) else {}

+    @staticmethod
+    def _canonical_kb_payload(data: object) -> dict[str, Any]:
+        """Normalize knowledge base create/update payloads.
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing request canonicalization in the schema and making `update_kb`’s update fields explicit to avoid indirection and duplicated normalization logic.

You can simplify both of the flagged areas without losing any of the new behavior.

### 1. Avoid duplicating canonicalization logic

Instead of maintaining `_canonical_kb_payload` and a separate `canonical_payload` on the schema, centralize the logic in the schema and call it here.

For example, if you already have `KnowledgeBaseRequest.canonical_payload()`:

```python
# service.py
from .schemas import KnowledgeBaseRequest  # adjust import

@staticmethod
def _canonical_kb_payload(data: object) -> dict[str, Any]:
    raw = KnowledgeBaseService._payload(data)
    req = KnowledgeBaseRequest(**raw)
    return req.canonical_payload()
```

Or, if you can call the schema directly where you currently use `_canonical_kb_payload`, remove the helper entirely and keep the service dealing only with canonical dicts:

```python
async def create_kb(self, data: object) -> tuple[dict[str, Any], str]:
    kb_manager = self.get_kb_manager()
    raw = self._payload(data)
    payload = KnowledgeBaseRequest(**raw).canonical_payload()
    ...
```

This way the `name ➜ kb_name` migration and any future normalization lives in one place.

---

### 2. Simplify `update_kb` without losing partial-update semantics

You can keep the “at least one field” validation and the “default to current values” behavior, but make the update fields explicit instead of flowing through `update_keys` / `update_data` / `**update_data`.

Current logic (simplified):

```python
update_keys = [...]
provided_updates = {key: payload[key] for key in update_keys if key in payload}
...
current = current_kb.kb
update_data = {key: getattr(current, key) for key in update_keys}
update_data.update(provided_updates)

kb_helper = await self.get_kb_manager().update_kb(
    kb_id=kb_id,
    **update_data,
)
```

You can replace it with explicit parameters:

```python
async def update_kb(self, data: object) -> tuple[dict[str, Any], str]:
    payload = self._canonical_kb_payload(data)
    kb_id = payload.get("kb_id")
    if not kb_id:
        raise KnowledgeBaseServiceError("缺少参数 kb_id")

    # Allowed update fields
    fields = (
        "kb_name",
        "description",
        "emoji",
        "embedding_provider_id",
        "rerank_provider_id",
        "chunk_size",
        "chunk_overlap",
        "top_k_dense",
        "top_k_sparse",
        "top_m_final",
    )
    if not any(field in payload for field in fields):
        raise KnowledgeBaseServiceError("至少需要提供一个更新字段")

    current_kb = await self.get_kb_manager().get_kb(kb_id)
    if not current_kb:
        raise KnowledgeBaseServiceError("知识库不存在")
    current = current_kb.kb

    kb_helper = await self.get_kb_manager().update_kb(
        kb_id=kb_id,
        kb_name=payload.get("kb_name", current.kb_name),
        description=payload.get("description", current.description),
        emoji=payload.get("emoji", current.emoji),
        embedding_provider_id=payload.get("embedding_provider_id", current.embedding_provider_id),
        rerank_provider_id=payload.get("rerank_provider_id", current.rerank_provider_id),
        chunk_size=payload.get("chunk_size", current.chunk_size),
        chunk_overlap=payload.get("chunk_overlap", current.chunk_overlap),
        top_k_dense=payload.get("top_k_dense", current.top_k_dense),
        top_k_sparse=payload.get("top_k_sparse", current.top_k_sparse),
        top_m_final=payload.get("top_m_final", current.top_m_final),
    )
    if not kb_helper:
        raise KnowledgeBaseServiceError("知识库不存在")
    return kb_helper.kb.model_dump(), "更新知识库成功"
```

This keeps:

- Partial updates (only provided fields overwrite).
- Defaults from current KB values for missing fields.
- The “at least one update field” requirement.

But it removes the extra dict-building and makes the updatable fields obvious at the call site.
</issue_to_address>

### Comment 3
<location path="astrbot/dashboard/schemas.py" line_range="207" />
<code_context>


 class KnowledgeBaseRequest(OpenModel):
-    kb_id: str | None = None
-    name: str | None = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing knowledge base payload normalization in `KnowledgeBaseRequest` using Pydantic field aliases for the legacy `name` field to avoid custom migration logic and duplicated canonicalization.

You can reduce complexity here by (1) making `KnowledgeBaseRequest` the single source of truth for normalization, and (2) using Pydantic aliases to handle the legacy `name` field instead of custom logic.

### 1. Remove duplicate canonicalization in the service

Instead of having both `canonical_payload()` on the model and `_canonical_kb_payload()` in the service, keep only the model-level one and call it from the service:

```python
# in the service, replace `_canonical_kb_payload(...)` usage with:
payload = kb_request.canonical_payload()
# ...use `payload` directly
```

Then delete `_canonical_kb_payload` entirely. This way, any future changes to the KB payload shape happen in one place.

### 2. Use a field alias for the legacy `name` input

You can avoid manual `getattr(self, "name", None)` and keep the migration logic declarative:

```python
from pydantic import Field

class KnowledgeBaseRequest(OpenModel):
    kb_name: str | None = Field(None, alias="name")  # accept legacy "name"
    description: str | None = None
    emoji: str | None = None
    embedding_provider_id: str | None = None
    rerank_provider_id: str | None = None
    chunk_size: int | None = None
    chunk_overlap: int | None = None
    top_k_dense: int | None = None
    top_k_sparse: int | None = None
    top_m_final: int | None = None

    def canonical_payload(self) -> dict[str, Any]:
        return self.model_dump(
            exclude_unset=True,
            include={
                "kb_name",
                "description",
                "emoji",
                "embedding_provider_id",
                "rerank_provider_id",
                "chunk_size",
                "chunk_overlap",
                "top_k_dense",
                "top_k_sparse",
                "top_m_final",
            },
            by_alias=False,  # ensure we output `kb_name`, not `name`
        )
```

This keeps all existing behavior (including support for legacy `name`) while:

- Eliminating duplicated normalization logic between layers.
- Removing the manual migration code path that can drift from the service logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

chunk_overlap=None,
top_k_dense=12,
top_k_sparse=8,
top_m_final=3,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Extend create_kb tests to cover validation errors (missing names / embedding provider).

To round this out, please also add tests for validation failures:

  • When both kb_name and name are missing/empty, create_kb should raise KnowledgeBaseServiceError("知识库名称不能为空").
  • When embedding_provider_id is missing or invalid (provider manager returns None), the service should raise the current error used in that case.

These will ensure the service enforces the new create semantics and catches regressions in validation behavior.

Comment thread astrbot/dashboard/services/knowledge_base_service.py
Comment thread astrbot/dashboard/schemas.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the knowledge base management system by standardizing payload schemas, introducing pagination to the listing service, updating OpenAPI specifications, and adding unit tests. However, several critical issues were identified during the review: removing _model_dict will cause NameError exceptions in other endpoints that still reference it; the KnowledgeBaseCreateRequest schema is missing from schemas.py despite being expected by the frontend and OpenAPI spec; combining allOf with additionalProperties: false in the OpenAPI spec will cause validation failures; and using getattr without a default value when updating knowledge bases could lead to an AttributeError.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

return payload if isinstance(payload, dict) else {}


async def _run(operation, *, prefix: str):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The helper function _model_dict was removed from this file. However, it is still referenced in multiple other endpoints within the same file:

  • import_knowledge_base_documents (line 210)
  • import_knowledge_base_document_url (line 224)
  • retrieve_knowledge_base (line 304)

Removing _model_dict will cause a NameError when any of these endpoints are called. Please restore _model_dict or refactor those endpoints to use payload.model_dump(exclude_none=True) directly.

def _model_dict(payload) -> dict[str, Any]:
    if payload is None:
        return {}
    if hasattr(payload, "model_dump"):
        return payload.model_dump(exclude_none=True)
    return payload if isinstance(payload, dict) else {}


async def _run(operation, *, prefix: str):

Comment on lines 244 to 245

class KnowledgeBaseImportRequest(OpenModel):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR description mentions syncing KnowledgeBaseCreateRequest Pydantic models, and the OpenAPI spec/frontend types expect it. However, KnowledgeBaseCreateRequest is not defined in astrbot/dashboard/schemas.py.

Please define KnowledgeBaseCreateRequest inheriting from KnowledgeBaseRequest with kb_name and embedding_provider_id as required fields.

Suggested change
class KnowledgeBaseImportRequest(OpenModel):
class KnowledgeBaseCreateRequest(KnowledgeBaseRequest):
kb_name: str
embedding_provider_id: str
class KnowledgeBaseImportRequest(OpenModel):

Comment thread openspec/openapi-v1.yaml Outdated
Comment on lines +5649 to +5659
KnowledgeBaseCreateRequest:
allOf:
- $ref: "#/components/schemas/KnowledgeBaseRequest"
- type: object
required: [kb_name, embedding_provider_id]
properties:
kb_name:
type: string
embedding_provider_id:
type: string
additionalProperties: false

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In OpenAPI, combining allOf with additionalProperties: false on a sub-schema will cause validation to fail for any properties defined in the other allOf schemas (such as description, emoji, etc. from KnowledgeBaseRequest). This is because the validator evaluates the sub-schema strictly and rejects any property not explicitly listed in its own properties block.

Please remove additionalProperties: false from the inline schema of KnowledgeBaseCreateRequest.

    KnowledgeBaseCreateRequest:
      allOf:
        - $ref: "#/components/schemas/KnowledgeBaseRequest"
        - type: object
          required: [kb_name, embedding_provider_id]
          properties:
            kb_name:
              type: string
            embedding_provider_id:
              type: string

if not current_kb:
raise KnowledgeBaseServiceError("知识库不存在")
current = current_kb.kb
update_data = {key: getattr(current, key) for key in update_keys}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using getattr(current, key) without a default value can raise an AttributeError if any of the update_keys are missing from the current object (e.g., due to database schema mismatches or pending migrations).

Using getattr(current, key, None) is safer and prevents potential runtime crashes.

Suggested change
update_data = {key: getattr(current, key) for key in update_keys}
update_data = {key: getattr(current, key, None) for key in update_keys}

…fix OpenAPI allOf validation

- Replace 3 remaining _model_dict() calls with payload.model_dump(exclude_none=True)
- Add KnowledgeBaseCreateRequest schema with required kb_name + embedding_provider_id
- Remove additionalProperties: false from OpenAPI allOf sub-schema to avoid validation failures
- Use Pydantic Field(alias="name") + populate_by_name=True for legacy name migration
- Delegate _canonical_kb_payload to KnowledgeBaseRequest to eliminate duplicated logic
- Add getattr default None for update_kb to prevent AttributeError
- Add validation error tests: missing kb_name, missing embedding_provider_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:webui The bug / feature is about webui(dashboard) of astrbot. feature:knowledge-base The bug / feature is about knowledge base size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant