fix: align knowledge base CRUD API contract with kb_name, canonical payload fields, and matching OpenAPI types#9000
Conversation
Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The legacy
name→kb_namenormalization now exists both inKnowledgeBaseService._canonical_kb_payloadandKnowledgeBaseRequest.canonical_payload; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths. - Instead of manually handling the legacy
namefield viagetattr(self, "name", None)andpayload.get("name"), you could model it explicitly in the Pydantic schema with aField(alias="name")or similar, which would make the legacy compatibility more self-documenting and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The legacy `name` → `kb_name` normalization now exists both in `KnowledgeBaseService._canonical_kb_payload` and `KnowledgeBaseRequest.canonical_payload`; consider centralizing this logic in one place (e.g., only the schema helper) to avoid future drift between the two paths.
- Instead of manually handling the legacy `name` field via `getattr(self, "name", None)` and `payload.get("name")`, you could model it explicitly in the Pydantic schema with a `Field(alias="name")` or similar, which would make the legacy compatibility more self-documenting and less error-prone.
## Individual Comments
### Comment 1
<location path="tests/unit/test_knowledge_base_service_contract.py" line_range="187" />
<code_context>
+@pytest.mark.asyncio
+async def test_create_kb_accepts_legacy_name_field():
</code_context>
<issue_to_address>
**suggestion (testing):** Extend create_kb tests to cover validation errors (missing names / embedding provider).
To round this out, please also add tests for validation failures:
- When both `kb_name` and `name` are missing/empty, `create_kb` should raise `KnowledgeBaseServiceError("知识库名称不能为空")`.
- When `embedding_provider_id` is missing or invalid (provider manager returns `None`), the service should raise the current error used in that case.
These will ensure the service enforces the new create semantics and catches regressions in validation behavior.
</issue_to_address>
### Comment 2
<location path="astrbot/dashboard/services/knowledge_base_service.py" line_range="32" />
<code_context>
def _payload(data: object) -> dict[str, Any]:
return data if isinstance(data, dict) else {}
+ @staticmethod
+ def _canonical_kb_payload(data: object) -> dict[str, Any]:
+ """Normalize knowledge base create/update payloads.
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing request canonicalization in the schema and making `update_kb`’s update fields explicit to avoid indirection and duplicated normalization logic.
You can simplify both of the flagged areas without losing any of the new behavior.
### 1. Avoid duplicating canonicalization logic
Instead of maintaining `_canonical_kb_payload` and a separate `canonical_payload` on the schema, centralize the logic in the schema and call it here.
For example, if you already have `KnowledgeBaseRequest.canonical_payload()`:
```python
# service.py
from .schemas import KnowledgeBaseRequest # adjust import
@staticmethod
def _canonical_kb_payload(data: object) -> dict[str, Any]:
raw = KnowledgeBaseService._payload(data)
req = KnowledgeBaseRequest(**raw)
return req.canonical_payload()
```
Or, if you can call the schema directly where you currently use `_canonical_kb_payload`, remove the helper entirely and keep the service dealing only with canonical dicts:
```python
async def create_kb(self, data: object) -> tuple[dict[str, Any], str]:
kb_manager = self.get_kb_manager()
raw = self._payload(data)
payload = KnowledgeBaseRequest(**raw).canonical_payload()
...
```
This way the `name ➜ kb_name` migration and any future normalization lives in one place.
---
### 2. Simplify `update_kb` without losing partial-update semantics
You can keep the “at least one field” validation and the “default to current values” behavior, but make the update fields explicit instead of flowing through `update_keys` / `update_data` / `**update_data`.
Current logic (simplified):
```python
update_keys = [...]
provided_updates = {key: payload[key] for key in update_keys if key in payload}
...
current = current_kb.kb
update_data = {key: getattr(current, key) for key in update_keys}
update_data.update(provided_updates)
kb_helper = await self.get_kb_manager().update_kb(
kb_id=kb_id,
**update_data,
)
```
You can replace it with explicit parameters:
```python
async def update_kb(self, data: object) -> tuple[dict[str, Any], str]:
payload = self._canonical_kb_payload(data)
kb_id = payload.get("kb_id")
if not kb_id:
raise KnowledgeBaseServiceError("缺少参数 kb_id")
# Allowed update fields
fields = (
"kb_name",
"description",
"emoji",
"embedding_provider_id",
"rerank_provider_id",
"chunk_size",
"chunk_overlap",
"top_k_dense",
"top_k_sparse",
"top_m_final",
)
if not any(field in payload for field in fields):
raise KnowledgeBaseServiceError("至少需要提供一个更新字段")
current_kb = await self.get_kb_manager().get_kb(kb_id)
if not current_kb:
raise KnowledgeBaseServiceError("知识库不存在")
current = current_kb.kb
kb_helper = await self.get_kb_manager().update_kb(
kb_id=kb_id,
kb_name=payload.get("kb_name", current.kb_name),
description=payload.get("description", current.description),
emoji=payload.get("emoji", current.emoji),
embedding_provider_id=payload.get("embedding_provider_id", current.embedding_provider_id),
rerank_provider_id=payload.get("rerank_provider_id", current.rerank_provider_id),
chunk_size=payload.get("chunk_size", current.chunk_size),
chunk_overlap=payload.get("chunk_overlap", current.chunk_overlap),
top_k_dense=payload.get("top_k_dense", current.top_k_dense),
top_k_sparse=payload.get("top_k_sparse", current.top_k_sparse),
top_m_final=payload.get("top_m_final", current.top_m_final),
)
if not kb_helper:
raise KnowledgeBaseServiceError("知识库不存在")
return kb_helper.kb.model_dump(), "更新知识库成功"
```
This keeps:
- Partial updates (only provided fields overwrite).
- Defaults from current KB values for missing fields.
- The “at least one update field” requirement.
But it removes the extra dict-building and makes the updatable fields obvious at the call site.
</issue_to_address>
### Comment 3
<location path="astrbot/dashboard/schemas.py" line_range="207" />
<code_context>
class KnowledgeBaseRequest(OpenModel):
- kb_id: str | None = None
- name: str | None = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider centralizing knowledge base payload normalization in `KnowledgeBaseRequest` using Pydantic field aliases for the legacy `name` field to avoid custom migration logic and duplicated canonicalization.
You can reduce complexity here by (1) making `KnowledgeBaseRequest` the single source of truth for normalization, and (2) using Pydantic aliases to handle the legacy `name` field instead of custom logic.
### 1. Remove duplicate canonicalization in the service
Instead of having both `canonical_payload()` on the model and `_canonical_kb_payload()` in the service, keep only the model-level one and call it from the service:
```python
# in the service, replace `_canonical_kb_payload(...)` usage with:
payload = kb_request.canonical_payload()
# ...use `payload` directly
```
Then delete `_canonical_kb_payload` entirely. This way, any future changes to the KB payload shape happen in one place.
### 2. Use a field alias for the legacy `name` input
You can avoid manual `getattr(self, "name", None)` and keep the migration logic declarative:
```python
from pydantic import Field
class KnowledgeBaseRequest(OpenModel):
kb_name: str | None = Field(None, alias="name") # accept legacy "name"
description: str | None = None
emoji: str | None = None
embedding_provider_id: str | None = None
rerank_provider_id: str | None = None
chunk_size: int | None = None
chunk_overlap: int | None = None
top_k_dense: int | None = None
top_k_sparse: int | None = None
top_m_final: int | None = None
def canonical_payload(self) -> dict[str, Any]:
return self.model_dump(
exclude_unset=True,
include={
"kb_name",
"description",
"emoji",
"embedding_provider_id",
"rerank_provider_id",
"chunk_size",
"chunk_overlap",
"top_k_dense",
"top_k_sparse",
"top_m_final",
},
by_alias=False, # ensure we output `kb_name`, not `name`
)
```
This keeps all existing behavior (including support for legacy `name`) while:
- Eliminating duplicated normalization logic between layers.
- Removing the manual migration code path that can drift from the service logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| chunk_overlap=None, | ||
| top_k_dense=12, | ||
| top_k_sparse=8, | ||
| top_m_final=3, |
There was a problem hiding this comment.
suggestion (testing): Extend create_kb tests to cover validation errors (missing names / embedding provider).
To round this out, please also add tests for validation failures:
- When both
kb_nameandnameare missing/empty,create_kbshould raiseKnowledgeBaseServiceError("知识库名称不能为空"). - When
embedding_provider_idis missing or invalid (provider manager returnsNone), the service should raise the current error used in that case.
These will ensure the service enforces the new create semantics and catches regressions in validation behavior.
There was a problem hiding this comment.
Code Review
This pull request refactors the knowledge base management system by standardizing payload schemas, introducing pagination to the listing service, updating OpenAPI specifications, and adding unit tests. However, several critical issues were identified during the review: removing _model_dict will cause NameError exceptions in other endpoints that still reference it; the KnowledgeBaseCreateRequest schema is missing from schemas.py despite being expected by the frontend and OpenAPI spec; combining allOf with additionalProperties: false in the OpenAPI spec will cause validation failures; and using getattr without a default value when updating knowledge bases could lead to an AttributeError.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return payload if isinstance(payload, dict) else {} | ||
|
|
||
|
|
||
| async def _run(operation, *, prefix: str): |
There was a problem hiding this comment.
The helper function _model_dict was removed from this file. However, it is still referenced in multiple other endpoints within the same file:
import_knowledge_base_documents(line 210)import_knowledge_base_document_url(line 224)retrieve_knowledge_base(line 304)
Removing _model_dict will cause a NameError when any of these endpoints are called. Please restore _model_dict or refactor those endpoints to use payload.model_dump(exclude_none=True) directly.
def _model_dict(payload) -> dict[str, Any]:
if payload is None:
return {}
if hasattr(payload, "model_dump"):
return payload.model_dump(exclude_none=True)
return payload if isinstance(payload, dict) else {}
async def _run(operation, *, prefix: str):|
|
||
| class KnowledgeBaseImportRequest(OpenModel): |
There was a problem hiding this comment.
The PR description mentions syncing KnowledgeBaseCreateRequest Pydantic models, and the OpenAPI spec/frontend types expect it. However, KnowledgeBaseCreateRequest is not defined in astrbot/dashboard/schemas.py.
Please define KnowledgeBaseCreateRequest inheriting from KnowledgeBaseRequest with kb_name and embedding_provider_id as required fields.
| class KnowledgeBaseImportRequest(OpenModel): | |
| class KnowledgeBaseCreateRequest(KnowledgeBaseRequest): | |
| kb_name: str | |
| embedding_provider_id: str | |
| class KnowledgeBaseImportRequest(OpenModel): |
| KnowledgeBaseCreateRequest: | ||
| allOf: | ||
| - $ref: "#/components/schemas/KnowledgeBaseRequest" | ||
| - type: object | ||
| required: [kb_name, embedding_provider_id] | ||
| properties: | ||
| kb_name: | ||
| type: string | ||
| embedding_provider_id: | ||
| type: string | ||
| additionalProperties: false |
There was a problem hiding this comment.
In OpenAPI, combining allOf with additionalProperties: false on a sub-schema will cause validation to fail for any properties defined in the other allOf schemas (such as description, emoji, etc. from KnowledgeBaseRequest). This is because the validator evaluates the sub-schema strictly and rejects any property not explicitly listed in its own properties block.
Please remove additionalProperties: false from the inline schema of KnowledgeBaseCreateRequest.
KnowledgeBaseCreateRequest:
allOf:
- $ref: "#/components/schemas/KnowledgeBaseRequest"
- type: object
required: [kb_name, embedding_provider_id]
properties:
kb_name:
type: string
embedding_provider_id:
type: string| if not current_kb: | ||
| raise KnowledgeBaseServiceError("知识库不存在") | ||
| current = current_kb.kb | ||
| update_data = {key: getattr(current, key) for key in update_keys} |
There was a problem hiding this comment.
Using getattr(current, key) without a default value can raise an AttributeError if any of the update_keys are missing from the current object (e.g., due to database schema mismatches or pending migrations).
Using getattr(current, key, None) is safer and prevents potential runtime crashes.
| update_data = {key: getattr(current, key) for key in update_keys} | |
| update_data = {key: getattr(current, key, None) for key in update_keys} |
…fix OpenAPI allOf validation - Replace 3 remaining _model_dict() calls with payload.model_dump(exclude_none=True) - Add KnowledgeBaseCreateRequest schema with required kb_name + embedding_provider_id - Remove additionalProperties: false from OpenAPI allOf sub-schema to avoid validation failures - Use Pydantic Field(alias="name") + populate_by_name=True for legacy name migration - Delegate _canonical_kb_payload to KnowledgeBaseRequest to eliminate duplicated logic - Add getattr default None for update_kb to prevent AttributeError - Add validation error tests: missing kb_name, missing embedding_provider_id
Unify knowledge base create/update/list with kb_name, canonical_payload, optional list pagination with total, and matching OpenAPI plus dashboard types.
The knowledge base CRUD (create/update/list) API contract is inconsistent between frontend and backend. The backend schema uses a
namefield while the generated frontend types expectkb_name. Both create and update share the sameKnowledgeBaseRequestschema, but create requireskb_nameandembedding_provider_id, while update does not — this shared schema leads to confusing validation semantics. Additionally, the knowledge base payload is missing canonical fields such asemoji,chunk_size,chunk_overlap,top_k_dense,top_k_sparse, andtop_m_final.This change unifies the frontend-backend contract: renames
nametokb_name, introducesKnowledgeBaseCreateRequestto separate create and update semantics, promotes optional configuration parameters into canonicalKnowledgeBaseRequestfields, and syncs the OpenAPI spec, backend schemas, and generated frontend types.Modifications / 改动点
openspec/openapi-v1.yaml):KnowledgeBaseRequest:name→kb_name; addedemoji,chunk_size,chunk_overlap,top_k_dense,top_k_sparse,top_m_finalfields;embedding_provider_id/rerank_provider_idnow acceptnullKnowledgeBaseCreateRequestinheritingKnowledgeBaseRequestviaallOf, markingkb_nameandembedding_provider_idas requiredastrbot/dashboard/schemas.py): syncedKnowledgeBaseRequestandKnowledgeBaseCreateRequestPydantic modelsastrbot/dashboard/api/knowledge_bases.py):create_knowledge_baseusesKnowledgeBaseCreateRequest;update_knowledge_baseandlist_knowledge_basesadapted tokb_nameastrbot/dashboard/services/knowledge_base_service.py): adapted to new schema field names and typesdashboard/src/api/generated/openapi-v1/types.gen.ts): regenerateddashboard/src/api/v1.ts): adapted to new typesdashboard/src/views/knowledge-base/KBList.vue):name→kb_nametests/unit/test_knowledge_base_service_contract.py): added 266 lines of end-to-end contract testsScreenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Align knowledge base CRUD contracts across backend, OpenAPI spec, and frontend, including canonical field names, pagination behavior, and configuration parameters.
New Features:
Bug Fixes:
Enhancements:
Tests: