Skip to content

Feat image caption cache#8966

Open
FloranceYeh wants to merge 11 commits into
AstrBotDevs:masterfrom
FloranceYeh:feat-image-caption-cache
Open

Feat image caption cache#8966
FloranceYeh wants to merge 11 commits into
AstrBotDevs:masterfrom
FloranceYeh:feat-image-caption-cache

Conversation

@FloranceYeh

@FloranceYeh FloranceYeh commented Jun 23, 2026

Copy link
Copy Markdown

This PR adds image caption result caching for repeated images and exposes the cache TTL in configuration.

Modifications / 改动点

  • add image caption cache utility

  • reuse cached vision results when the same image is received again within the configured TTL

  • add provider setting for image_caption_cache_ttl

  • add WebUI i18n labels for the new config field in zh-CN, en-US, and ru-RU

  • add tests covering cache behavior and main agent integration

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

PixPin_2026-06-04_23-06-03 PixPin_2026-06-04_23-05-19

Checklist / 检查清单

  • [ ] 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • [x ] 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • [x ] 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • [x ] 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add configurable caching for image captioning results and integrate it into group chat and main agent image handling.

New Features:

  • Introduce an image caption cache utility with TTL-based reuse of caption results for identical images across providers.
  • Expose an image_caption_cache_ttl provider setting and surface it through the configuration system and Web UI localization metadata.

Enhancements:

  • Update group chat context and main agent logic to route image captioning through the shared cache while improving provider resolution and error messages.
  • Adjust quoted image handling to use cached captions and more stable provider identity keys for cache partitioning.

Tests:

  • Add unit tests for the image caption cache behavior, including keying, locking, TTL handling, and integration with group chat context and main agent image caption workflows.

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 23, 2026
@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

@FloranceYeh is attempting to deploy a commit to the soulter's projects Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot Bot added area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. labels Jun 23, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In _process_quote_message, the provider identity for caching is built inline from prov.provider_config.get("id", img_cap_prov_id or ""); consider reusing _resolve_provider_cache_identity (or a shared helper) so that cache keys are consistent across all image caption entry points and don’t accidentally merge results across different providers when IDs are missing.
  • The new image-caption cache exceptions and log messages in group_chat_context.py were switched from Chinese to English; if this module is intended to keep user-facing messages localized, it may be better to retain or add localized variants to preserve existing UX consistency.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_process_quote_message`, the provider identity for caching is built inline from `prov.provider_config.get("id", img_cap_prov_id or "")`; consider reusing `_resolve_provider_cache_identity` (or a shared helper) so that cache keys are consistent across all image caption entry points and don’t accidentally merge results across different providers when IDs are missing.
- The new image-caption cache exceptions and log messages in `group_chat_context.py` were switched from Chinese to English; if this module is intended to keep user-facing messages localized, it may be better to retain or add localized variants to preserve existing UX consistency.

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="115-117" />
<code_context>
+            raise Exception(
+                f"Provider type is invalid for image captioning: {type(provider)}."
+            )
+        provider_id = _resolve_provider_cache_identity(
+            provider,
+            configured_provider_id=image_caption_provider_id,
+        )
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Align provider identity resolution with the usage in astr_main_agent to avoid divergent cache keys.

Here this uses `_resolve_provider_cache_identity`, which falls back to provider_config id, type, and model. In `astr_main_agent._request_img_caption_with_provider` the provider id is derived differently (`provider_config.get("id", img_cap_prov_id or "")`) with no type/model fallback. This can give the same provider different ids (including empty string) across call sites, preventing cache reuse for identical images. Please centralize provider-id resolution (e.g., by reusing `_resolve_provider_cache_identity` from both places) so the cache key is consistent system-wide.

Suggested implementation:

```python
        if not isinstance(provider, Provider):
            raise Exception(
                f"Provider type is invalid for image captioning: {type(provider)}."
            )

        # Align provider identity resolution with astr_main_agent._request_img_caption_with_provider
        # so that image caption cache keys are consistent across call sites.
        # astr_main_agent currently uses:
        #   provider_config.get("id", img_cap_prov_id or "")
        # We replicate that logic here.
        provider_config = getattr(provider, "config", {}) or {}
        provider_id = provider_config.get("id", image_caption_provider_id or "")

"""
Group chat context awareness.
            "image_caption": image_caption,

```

To fully implement the centralization and avoid divergent cache keys, you should also:

1. **Remove / replace `_resolve_provider_cache_identity` usage in this file** (if it appears elsewhere in `group_chat_context.py`):
   - Delete any previous `provider_id = _resolve_provider_cache_identity(...)` calls.
   - Ensure any cache key construction for image captioning now uses the `provider_id` defined in the snippet above.

2. **Align `astr_main_agent._request_img_caption_with_provider`**:
   - If you want true centralization, extract the shared logic to a small helper, e.g.:
     ```python
     def resolve_image_caption_provider_id(provider_config: dict, configured_id: str | None) -> str:
         return provider_config.get("id", configured_id or "")
     ```
   - Use this helper both in `astr_main_agent._request_img_caption_with_provider` and here instead of duplicating the logic.
   - If you add this helper to a shared module (e.g. `astrbot/builtin_stars/astrbot/utils.py`), update imports in both files accordingly.

3. **Ensure `provider_config` is consistent**:
   - If `provider.config` is not the right source of `provider_config` in your codebase, replace `getattr(provider, "config", {})` with the actual config object you use in `astr_main_agent._request_img_caption_with_provider`.
   - The key requirement is that both call sites derive `provider_id` from the *same* `provider_config` object using identical logic.
</issue_to_address>

### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="45-54" />
<code_context>
+    async def get_or_create(
</code_context>
<issue_to_address>
**suggestion (performance):** Consider cleaning up expired entries even when caching is disabled for the current call.

Because `ttl_seconds <= 0` short‑circuits to `caption_factory()` without touching the internal maps, entries created while caching was enabled are only cleaned up on later `get_or_create` calls that insert new entries. If caching stays disabled for a long time, expired entries and locks can linger in the dicts. Consider calling `_cleanup_expired_entries()` before the early return so stale entries are pruned even when caching is off for that call.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/builtin_stars/astrbot/group_chat_context.py
Comment thread astrbot/core/utils/image_caption_cache.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an image caption caching mechanism (ImageCaptionCache) to reuse image description results based on a configurable TTL, optimizing API usage across group chats and main agent workflows. The changes also include configuration updates, localization strings, and comprehensive unit tests. The review feedback is highly constructive, pointing out critical robustness improvements: resolving a potential memory leak in the cache's lock management, handling potential AttributeErrors when accessing provider configurations or text chat responses, and defensively wrapping local file read operations to prevent crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/core/utils/image_caption_cache.py
Comment thread astrbot/core/utils/image_caption_cache.py
Comment thread astrbot/builtin_stars/astrbot/group_chat_context.py Outdated
Comment thread astrbot/core/astr_main_agent.py Outdated
FloranceYeh and others added 3 commits June 25, 2026 15:31
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant