Question: Is the missing opening <think> tag expected behavior or an issue in the OpenAI-compatible API? #144

handsomelcx · 2026-06-26T08:56:45Z

handsomelcx
Jun 26, 2026

Hi,

First of all, thank you for maintaining this excellent fork. I really appreciate all the work you've put into it.

I have a question regarding the reasoning output format when using the OpenAI-compatible server.

I'm testing with Qwen3.6 using the latest JamePeng fork through the OpenAI API. During reasoning, the response looks something like this:

Thinking Process:
...

Instead of:

...

or returning the reasoning separately.

Because the opening tag is missing while the closing tag is still present, clients such as Cherry Studio cannot recognize the reasoning block and therefore cannot fold or collapse it.

I've confirmed that this behavior is the same when using both:

Cherry Studio (latest version)
Open WebUI (latest version)

So it doesn't appear to be a client-specific issue.

My question is:

Is this the expected behavior of the OpenAI-compatible server?
Is the opening intentionally removed by llama-cpp-python?
Or is this simply how Qwen3.6's chat template works?
If it's configurable, is there any option to preserve the original ... tags in the API response?

I'm mainly trying to understand where this behavior originates before investigating it further.

Thank you!

richardchen874-sys · 2026-06-27T04:15:36Z

richardchen874-sys
Jun 27, 2026

This is a good compatibility question.

I would treat this as a workflow-format issue rather than only a client display issue, because Cherry Studio and Open WebUI both depend on fairly consistent reasoning markers to separate visible content from thinking content.

There are probably three layers to check:

Model / chat template
Qwen3.6 may emit reasoning in a format controlled by its chat template. If the template strips or rewrites the opening <think> tag but leaves the closing tag, the OpenAI-compatible server may simply be forwarding that behavior.
Server-side normalization
The llama-cpp-python OpenAI-compatible server may be applying some post-processing or message formatting before returning the response. It would be useful to compare raw generation output against the API response to see whether the opening tag is removed before or after the server layer.
Client-side reasoning parser
Clients like Cherry Studio / Open WebUI usually expect either a complete <think>...</think> block or a separate reasoning field. A missing opening tag can break folding even if the reasoning text is technically present.

A few useful debugging checks:

test the same prompt through raw llama.cpp / CLI output
compare raw model output vs /v1/chat/completions response
check whether streaming and non-streaming responses differ
check whether the chat template contains <think> handling
test another reasoning model to see whether the behavior is Qwen-specific
inspect whether the server has an option to preserve or strip reasoning tags

In general, OpenAI-compatible transport is not always enough for reasoning models. Reasoning output also needs a stable representation: either complete tags, a separate reasoning field, or a clearly documented normalization rule.

If the opening tag is removed but the closing tag remains, I would lean toward treating it as a compatibility bug or at least an undocumented formatting behavior, because it makes downstream clients unable to reliably separate reasoning from final output.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is the missing opening <think> tag expected behavior or an issue in the OpenAI-compatible API? #144

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question: Is the missing opening <think> tag expected behavior or an issue in the OpenAI-compatible API? #144

Uh oh!

handsomelcx Jun 26, 2026

Replies: 1 comment

Uh oh!

richardchen874-sys Jun 27, 2026

handsomelcx
Jun 26, 2026

richardchen874-sys
Jun 27, 2026