Capability area: Agent harness / Tool use
What does M2.7 fail to do for you?
MiniMax-M2.7 fails to produce a valid JSON schema conforming to the validate_design tool definition when chaining from design_workflow.
What would "good" look like in M3?
validate_design receives a valid JSON schema that conforms to the tool's parameter definition — populated from the actual design_workflow result, not reconstructed from model reasoning.
Summary
MiniMax-M2.7 and MiniMax-M3 both fail to propagate tool output across a sequential tool call chain. When design_workflow returns a schema object, both models call the next tool (validate_design) with a schema that does not match the returned output — either reconstructed, incomplete, or structurally invalid against the tool definition.
MiniMax-M2.5 does not reproduce this behavior.
Environment
| Field |
Value |
| Models affected |
MiniMax-M2.7, MiniMax-M3 |
| Model passing |
MiniMax-M2.5 |
| Endpoint |
OpenRouter |
| Tool calling |
Sequential 2-step chain |
| Observed |
2026-06-17 (UTC+8) |
Expected behavior
design_workflow is called and returns a schema object
- The model reads the tool result
validate_design is called with schema set to the exact object returned in step 1, conforming to the tool's JSON schema definition
Actual behavior
The model calls validate_design with a schema argument that does not match the design_workflow output. The model appears to reconstruct the schema from internal reasoning rather than reading the tool result, producing a value that violates the tool's parameter schema — missing required fields or structurally incorrect.
Benchmark eval output:
issues: ["Validated design is missing fields: project name, client, start date, deadline, owner, budget"]
Tool call sequence observed
1. design_workflow(...)
→ returns: { name, information: [{name, type, ai_hint}, ...], states: [...], ... }
2. validate_design({ schema: <does not match step 1 output, fails tool schema validation> })
→ missing required information fields
Steps to reproduce
We have an open-source MCP tool benchmark that reproduces this consistently:
Repo: https://github.com/Inistate/harness-test-bench
- Clone the repo and follow the setup instructions
- Run the
smoke_all_tools_2 scenario against minimax/MiniMax-M2.7 or minimax/MiniMax-M3
- Inspect Task 2 (
task_2_module_design) — validate_design will be called with a schema that does not conform to the tool's parameter definition
- Compare against a passing model (e.g.
minimax/MiniMax-M2.5) running the same scenario
Questions
- Is this a known regression from M2.5?
- Does the model read tool results back into context between sequential calls, or is this not guaranteed?
- Is there a recommended pattern to ensure tool output is forwarded as-is rather than reconstructed?
Workaround
We currently exclude M2.7 and M3 from workflows requiring sequential tool output propagation. A fix at the model level would be preferable.
References
Reproduced across multiple runs in an independent MCP tool benchmark. M2.5 passed consistently. M2.7 scored 3/5 and M3 scored 2/5, with the T2 chain failing on every run for both models.
Capability area: Agent harness / Tool use
What does M2.7 fail to do for you?
MiniMax-M2.7 fails to produce a valid JSON schema conforming to the
validate_designtool definition when chaining fromdesign_workflow.What would "good" look like in M3?
validate_designreceives a valid JSON schema that conforms to the tool's parameter definition — populated from the actualdesign_workflowresult, not reconstructed from model reasoning.Summary
MiniMax-M2.7 and MiniMax-M3 both fail to propagate tool output across a sequential tool call chain. When
design_workflowreturns a schema object, both models call the next tool (validate_design) with a schema that does not match the returned output — either reconstructed, incomplete, or structurally invalid against the tool definition.MiniMax-M2.5 does not reproduce this behavior.
Environment
Expected behavior
design_workflowis called and returns a schema objectvalidate_designis called withschemaset to the exact object returned in step 1, conforming to the tool's JSON schema definitionActual behavior
The model calls
validate_designwith aschemaargument that does not match thedesign_workflowoutput. The model appears to reconstruct the schema from internal reasoning rather than reading the tool result, producing a value that violates the tool's parameter schema — missing required fields or structurally incorrect.Benchmark eval output:
Tool call sequence observed
Steps to reproduce
We have an open-source MCP tool benchmark that reproduces this consistently:
Repo: https://github.com/Inistate/harness-test-bench
smoke_all_tools_2scenario againstminimax/MiniMax-M2.7orminimax/MiniMax-M3task_2_module_design) —validate_designwill be called with a schema that does not conform to the tool's parameter definitionminimax/MiniMax-M2.5) running the same scenarioQuestions
Workaround
We currently exclude M2.7 and M3 from workflows requiring sequential tool output propagation. A fix at the model level would be preferable.
References
Reproduced across multiple runs in an independent MCP tool benchmark. M2.5 passed consistently. M2.7 scored 3/5 and M3 scored 2/5, with the T2 chain failing on every run for both models.