Feat image caption cache#8966
Conversation
|
@FloranceYeh is attempting to deploy a commit to the soulter's projects Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
_process_quote_message, the provider identity for caching is built inline fromprov.provider_config.get("id", img_cap_prov_id or ""); consider reusing_resolve_provider_cache_identity(or a shared helper) so that cache keys are consistent across all image caption entry points and don’t accidentally merge results across different providers when IDs are missing. - The new image-caption cache exceptions and log messages in
group_chat_context.pywere switched from Chinese to English; if this module is intended to keep user-facing messages localized, it may be better to retain or add localized variants to preserve existing UX consistency.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_process_quote_message`, the provider identity for caching is built inline from `prov.provider_config.get("id", img_cap_prov_id or "")`; consider reusing `_resolve_provider_cache_identity` (or a shared helper) so that cache keys are consistent across all image caption entry points and don’t accidentally merge results across different providers when IDs are missing.
- The new image-caption cache exceptions and log messages in `group_chat_context.py` were switched from Chinese to English; if this module is intended to keep user-facing messages localized, it may be better to retain or add localized variants to preserve existing UX consistency.
## Individual Comments
### Comment 1
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="115-117" />
<code_context>
+ raise Exception(
+ f"Provider type is invalid for image captioning: {type(provider)}."
+ )
+ provider_id = _resolve_provider_cache_identity(
+ provider,
+ configured_provider_id=image_caption_provider_id,
+ )
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Align provider identity resolution with the usage in astr_main_agent to avoid divergent cache keys.
Here this uses `_resolve_provider_cache_identity`, which falls back to provider_config id, type, and model. In `astr_main_agent._request_img_caption_with_provider` the provider id is derived differently (`provider_config.get("id", img_cap_prov_id or "")`) with no type/model fallback. This can give the same provider different ids (including empty string) across call sites, preventing cache reuse for identical images. Please centralize provider-id resolution (e.g., by reusing `_resolve_provider_cache_identity` from both places) so the cache key is consistent system-wide.
Suggested implementation:
```python
if not isinstance(provider, Provider):
raise Exception(
f"Provider type is invalid for image captioning: {type(provider)}."
)
# Align provider identity resolution with astr_main_agent._request_img_caption_with_provider
# so that image caption cache keys are consistent across call sites.
# astr_main_agent currently uses:
# provider_config.get("id", img_cap_prov_id or "")
# We replicate that logic here.
provider_config = getattr(provider, "config", {}) or {}
provider_id = provider_config.get("id", image_caption_provider_id or "")
"""
Group chat context awareness.
"image_caption": image_caption,
```
To fully implement the centralization and avoid divergent cache keys, you should also:
1. **Remove / replace `_resolve_provider_cache_identity` usage in this file** (if it appears elsewhere in `group_chat_context.py`):
- Delete any previous `provider_id = _resolve_provider_cache_identity(...)` calls.
- Ensure any cache key construction for image captioning now uses the `provider_id` defined in the snippet above.
2. **Align `astr_main_agent._request_img_caption_with_provider`**:
- If you want true centralization, extract the shared logic to a small helper, e.g.:
```python
def resolve_image_caption_provider_id(provider_config: dict, configured_id: str | None) -> str:
return provider_config.get("id", configured_id or "")
```
- Use this helper both in `astr_main_agent._request_img_caption_with_provider` and here instead of duplicating the logic.
- If you add this helper to a shared module (e.g. `astrbot/builtin_stars/astrbot/utils.py`), update imports in both files accordingly.
3. **Ensure `provider_config` is consistent**:
- If `provider.config` is not the right source of `provider_config` in your codebase, replace `getattr(provider, "config", {})` with the actual config object you use in `astr_main_agent._request_img_caption_with_provider`.
- The key requirement is that both call sites derive `provider_id` from the *same* `provider_config` object using identical logic.
</issue_to_address>
### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="45-54" />
<code_context>
+ async def get_or_create(
</code_context>
<issue_to_address>
**suggestion (performance):** Consider cleaning up expired entries even when caching is disabled for the current call.
Because `ttl_seconds <= 0` short‑circuits to `caption_factory()` without touching the internal maps, entries created while caching was enabled are only cleaned up on later `get_or_create` calls that insert new entries. If caching stays disabled for a long time, expired entries and locks can linger in the dicts. Consider calling `_cleanup_expired_entries()` before the early return so stale entries are pruned even when caching is off for that call.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces an image caption caching mechanism (ImageCaptionCache) to reuse image description results based on a configurable TTL, optimizing API usage across group chats and main agent workflows. The changes also include configuration updates, localization strings, and comprehensive unit tests. The review feedback is highly constructive, pointing out critical robustness improvements: resolving a potential memory leak in the cache's lock management, handling potential AttributeErrors when accessing provider configurations or text chat responses, and defensively wrapping local file read operations to prevent crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This PR adds image caption result caching for repeated images and exposes the cache TTL in configuration.
Modifications / 改动点
add image caption cache utility
reuse cached vision results when the same image is received again within the configured TTL
add provider setting for
image_caption_cache_ttladd WebUI i18n labels for the new config field in
zh-CN,en-US, andru-RUadd tests covering cache behavior and main agent integration
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
[ ] 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
[x ] 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
[x ] 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。[x ] 😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add configurable caching for image captioning results and integrate it into group chat and main agent image handling.
New Features:
image_caption_cache_ttlprovider setting and surface it through the configuration system and Web UI localization metadata.Enhancements:
Tests: