Skip to content

[M3 Bug] Token Plan + Cloudflare AI Gateway (BYOK) does not forward SSE — chunks are buffered and returned as a single payload #22

@FireTable

Description

@FireTable

Which inference path did you use?

MiniMax API (Token Plan, via Cloudflare AI Gateway / BYOK)

Inference parameters

  • model: MiniMax-M3
  • stream: true
  • max_tokens: default
  • temperature: default
  • request path: POST https://<gateway>.gateway.ai.cloudflare.com/v1/chat/completions (Cloudflare AI Gateway)
  • upstream: api.minimaxi.com (大陆版 Token Plan) — configured via BYOK (Bring Your Own Key)
  • client: raw curl -N (also reproduced from @langchain/openai TS / openai-python SDK)

Prompt / input

# A. Via Cloudflare AI Gateway (BYOK)  ← 问题路径
curl -N -X POST "https://<gateway>.gateway.ai.cloudflare.com/v1/chat/completions" \
  -H "Authorization: Bearer $CF_AIGW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M3",
    "stream": true,
    "messages": [{"role": "user", "content": "Count from 1 to 10 slowly"}]
  }'

# B. Direct to api.minimaxi.com (Token Plan)  ← 对照,正常
curl -N -X POST "https://api.minimaxi.com/v1/chat/completions" \
  -H "Authorization: Bearer $MINIMAX_TOKEN_PLAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{ ... 同上 ... }'

Expected behavior

Cloudflare AI Gateway 在 BYOK + stream: true 时应该透明转发上游 SSE:

  • 响应 Content-Type: text/event-stream + Transfer-Encoding: chunked
  • 上游每个 data: {...}\n\n 增量到客户端
  • TTFB ≈ TTFT(几百 ms),不是"等生成完再一次性返回"

A / B 两条路径的流式体验应完全一致

Actual behavior

SSE 没有被 CF 透传。观察:

  • B(直连 api.minimaxi.com):正常流式,chunks 一个个到,前端可以增量渲染 ✅
  • A(CF AI Gateway BYOK):chunks 全部存在且顺序正确,但挤在一起一次性返回
    • 客户端在 curl / OpenAI SDK / @langchain/openai 上都看不到 chunk-by-chunk 到达
    • 整段 data: {...}\n\n 行连成一片在生成结束后才一次性 flush
    • 没有"逐字打出"的效果,TTFB 接近"整个 response 生成完成"的耗时
    • 任何依赖 incremental render 的客户端(OpenAI SDK stream iterator、LCEL streaming、assistant-ui、自定义 SSE consumer)都不可用

Additional context

  • 根因方向:CF AI Gateway edge 在 BYOK 模式下对上游 text/event-stream 响应做了 buffering(典型表现:buffer 到 size threshold 或 end-of-stream 才 flush),没有透传给下游。属于 CF 侧行为,但提在这里因为:
    1. 用 MiniMax Token Plan 作为 upstream 可稳定复现
    2. MiniMax 端加 X-Accel-Buffering: no 等 header 可能绕过
    3. 走 CF 转发 Token Plan 的其他用户也会撞到同样的问题
  • 与模型无关:MiniMax-M2.7MiniMax-M3 在 Token Plan + CF BYOK 路径下表现一致
  • 还没测过 Anthropic-compatible 端点经 CF BYOK 是否同样问题
  • 直连 Token Plan 一切正常,问题专属于 CF AI Gateway BYOK 这一跳

Suggested mitigations

MiniMax(上游)侧可考虑:

  • streaming 响应加 X-Accel-Buffering: no header
  • 确保 Content-Type: text/event-stream 在第一个 byte 就 set,而不是 buffering 完成后才设
  • 文档里给出推荐的 CF AI Gateway / 反代层流式配置说明

Cloudflare(gateway)侧可考虑:

  • 上游是 text/event-stream 时跳过 response buffering
  • 允许把 BYOK upstream 标记为 "streaming-required",让 edge 不要 buffer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions