[M3 Bug] Token Plan + Cloudflare AI Gateway (BYOK) does not forward SSE — chunks are buffered and returned as a single payload

## Which inference path did you use?

MiniMax API (Token Plan, via Cloudflare AI Gateway / BYOK)

## Inference parameters

- model: MiniMax-M3
- stream: true
- max_tokens: default
- temperature: default
- request path: `POST https://<gateway>.gateway.ai.cloudflare.com/v1/chat/completions` (Cloudflare AI Gateway)
- upstream: `api.minimaxi.com` (大陆版 Token Plan) — configured via BYOK (Bring Your Own Key)
- client: raw `curl -N` (also reproduced from `@langchain/openai` TS / `openai-python` SDK)

## Prompt / input

```bash
# A. Via Cloudflare AI Gateway (BYOK)  ← 问题路径
curl -N -X POST "https://<gateway>.gateway.ai.cloudflare.com/v1/chat/completions" \
  -H "Authorization: Bearer $CF_AIGW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M3",
    "stream": true,
    "messages": [{"role": "user", "content": "Count from 1 to 10 slowly"}]
  }'

# B. Direct to api.minimaxi.com (Token Plan)  ← 对照，正常
curl -N -X POST "https://api.minimaxi.com/v1/chat/completions" \
  -H "Authorization: Bearer $MINIMAX_TOKEN_PLAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{ ... 同上 ... }'
```

## Expected behavior

Cloudflare AI Gateway 在 BYOK + `stream: true` 时应该**透明转发**上游 SSE：

- 响应 `Content-Type: text/event-stream` + `Transfer-Encoding: chunked`
- 上游每个 `data: {...}\n\n` 增量到客户端
- TTFB ≈ TTFT（几百 ms），不是"等生成完再一次性返回"

A / B 两条路径的流式体验应**完全一致**。

## Actual behavior

SSE 没有被 CF 透传。观察：

- B（直连 `api.minimaxi.com`）：正常流式，chunks 一个个到，前端可以增量渲染 ✅
- A（CF AI Gateway BYOK）：chunks **全部存在且顺序正确**，但**挤在一起一次性返回** ⛔
  - 客户端在 curl / OpenAI SDK / `@langchain/openai` 上都看不到 chunk-by-chunk 到达
  - 整段 `data: {...}\n\n` 行连成一片在生成结束后才一次性 flush
  - 没有"逐字打出"的效果，TTFB 接近"整个 response 生成完成"的耗时
  - 任何依赖 incremental render 的客户端（OpenAI SDK stream iterator、LCEL streaming、`assistant-ui`、自定义 SSE consumer）都不可用

## Additional context

- **根因方向**：CF AI Gateway edge 在 BYOK 模式下对上游 `text/event-stream` 响应做了 buffering（典型表现：buffer 到 size threshold 或 end-of-stream 才 flush），没有透传给下游。属于 CF 侧行为，但提在这里因为：
  1. 用 MiniMax Token Plan 作为 upstream 可稳定复现
  2. MiniMax 端加 `X-Accel-Buffering: no` 等 header 可能绕过
  3. 走 CF 转发 Token Plan 的其他用户也会撞到同样的问题
- 与模型无关：`MiniMax-M2.7` 和 `MiniMax-M3` 在 Token Plan + CF BYOK 路径下表现一致
- 还没测过 Anthropic-compatible 端点经 CF BYOK 是否同样问题
- 直连 Token Plan 一切正常，**问题专属于 CF AI Gateway BYOK 这一跳**

## Suggested mitigations

**MiniMax（上游）侧可考虑：**

- streaming 响应加 `X-Accel-Buffering: no` header
- 确保 `Content-Type: text/event-stream` 在第一个 byte 就 set，而不是 buffering 完成后才设
- 文档里给出推荐的 CF AI Gateway / 反代层流式配置说明

**Cloudflare（gateway）侧可考虑：**

- 上游是 `text/event-stream` 时跳过 response buffering
- 允许把 BYOK upstream 标记为 "streaming-required"，让 edge 不要 buffer


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[M3 Bug] Token Plan + Cloudflare AI Gateway (BYOK) does not forward SSE — chunks are buffered and returned as a single payload #22

Which inference path did you use?

Inference parameters

Prompt / input

Expected behavior

Actual behavior

Additional context

Suggested mitigations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[M3 Bug] Token Plan + Cloudflare AI Gateway (BYOK) does not forward SSE — chunks are buffered and returned as a single payload #22

Description

Which inference path did you use?

Inference parameters

Prompt / input

Expected behavior

Actual behavior

Additional context

Suggested mitigations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions