Python: Add AgentLoopMiddleware for re-running agents in a loop by eavanvalkenburg · Pull Request #6174 · microsoft/agent-framework

eavanvalkenburg · 2026-05-29T15:07:53Z

Motivation and Context

Many agent scenarios require re-running an agent until some condition is met (or indefinitely): iterative refinement, working through a todo list, draining background tasks, or judging whether the original question was actually answered. Today each of these requires bespoke orchestration around the agent run loop.

This adds a single, reusable middleware that drives the loop, so users get these patterns out of the box while keeping full control over the per-iteration input and the stopping condition.

Description

Adds AgentLoopMiddleware, an AgentMiddleware that re-runs the wrapped agent by calling call_next() repeatedly. One configurable class covers three common patterns, each with a convenience classmethod factory that forwards to the full constructor:

Ralph loop — AgentLoopMiddleware.ralph(...): no exit criteria, bounded only by an optional max_iterations. Includes feedback tracking inspired by the Ralph technique: record_feedback accumulates a progress log exposed to every callback via the progress kwarg and (by default) injected into the next iteration's input (inject_progress); fresh_context=True restarts each pass from the original task plus the log; is_complete (marker string or callable) stops early when the agent signals completion.
Predicate — AgentLoopMiddleware.with_predicate(should_continue, ...): loops while a callable returns True. Helper factories todos_remaining(provider) and background_tasks_running(provider) cover the TodoProvider / BackgroundAgentsProvider cases.
Judge — AgentLoopMiddleware.with_judge(judge_client, ...): a second chat client decides whether the original request was answered, using a JudgeVerdict structured-output response (with a text fallback for clients that don't honor structured output); loops while the answer is "no".

Additional capabilities (cross-cutting, on the base constructor):

Approval handling — on_approval_request auto-resolves pending user_input_request content (e.g. tools with approval_mode="always_require"), bounded by max_approval_rounds and exempt from max_iterations.
next_message controls the next iteration's input (defaults to a short "continue" nudge); returning None reuses the prior messages.
Supports both streaming and non-streaming runs, handling MiddlewareTermination cleanly.

Exports AgentLoopMiddleware, JudgeVerdict, todos_remaining, and background_tasks_running from agent_framework. Adds unit tests, a sample (samples/02-agents/middleware/agent_loop_middleware.py) demonstrating all four patterns, and documentation.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

moonbox3 · 2026-05-29T15:13:23Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework/_harness
_loop.py	226	6	97%	460, 468, 538, 608, 651, 726
TOTAL	39366	4478	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
7899	34 💤	0 ❌	0 🔥	2m 2s ⏱️

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 74%

✓ Correctness

The AgentLopMiddleware implementation is well-designed and handles both streaming and non-streaming paths correctly. The middleware properly interacts with the existing AgentContext/pipeline infrastructure, correctly manages the ResponseStream for lazy streaming execution, and handles MiddlewareTermination cleanly. The approval handling logic, judge condition, and feedback tracking all function as specified by the tests. One minor documentation/behavior mismatch exists: max_approval_rounds is documented as capping 'consecutive' rounds but the counter never resets after non-approval iterations, effectively making it a 'total' cap.

✓ Security Reliability

The AgentLopMiddleware implementation is well-structured with proper input validation (constructor rejects invalid max_iterations/max_approval_rounds), defensive type checks on call_next() results, clean MiddlewareTermination handling in streaming, and appropriate use of copy semantics for the progress log exposed to callbacks. No blocking security or reliability issues found. The approval callback invocation order (callback before cap check) is intentional per the test assertions. The judge fallback parsing is conservative (NOT_ANSWERED takes precedence). Memory growth in unbounded loops is by-design and documented.

✓ Test Coverage

The test suite is comprehensive, covering constructor validation, all three factory patterns, non-streaming and streaming loops, approval handling, provider helpers, and edge cases. However, there are several notable gaps: async callback variants are never tested directly (all tested callbacks are synchronous), provider helper predicates are unit-tested in isolation but never integrated into a full loop, streaming + judge mode combination is untested, and explicitly returning None from record_feedback to skip an entry has no coverage.

✓ Design Approach

The loop middleware is broadly aligned with the existing harness and middleware pipeline, but I found one design-level issue in judge mode: it reduces the original request to plain text before asking the judge, which breaks the stated “judge whether the original request was answered” behavior for multimodal or otherwise non-text inputs that the core message model already supports.

Automated review by eavanvalkenburg's agents

github-actions

Automated Code Review

Reviewers: 5 | Confidence: 84%

✓ Correctness

The AgentLopMiddleware implementation is well-structured and correct. The core loop logic (non-streaming and streaming), stop evaluation, feedback propagation, progress tracking, session snapshotting for fresh_context, and judge integration are all handled properly and extensively tested. No correctness bugs found.

✓ Security Reliability

The implementation is generally well-structured with proper safety caps (max_iterations defaults), defensive type checks, and progress-list copying to prevent callback mutation. The main reliability concern is an asymmetry in MiddlewareTermination handling between streaming and non-streaming paths: the streaming path catches it cleanly (preserving prior iteration results), while the non-streaming path lets it propagate, losing the aggregated transcript from completed iterations.

✓ Test Coverage

The test suite is thorough overall—covering construction, predicate patterns, judge mode, streaming, feedback tracking, fresh_context, session handling, and provider helpers. However, there are notable coverage gaps: (1) non-streaming MiddlewareTermination test (only streaming is tested), despite different implementation behavior between the two paths; (2) the usage_details accumulation logic (add_usage_details across iterations) is never exercised because RecordingChatClient produces no usage data; (3) async variants of record_feedback and next_message are untested (only should_continue has an async test).

✓ Failure Modes

The implementation is well-structured with proper error propagation. The streaming path catches MiddlewareTermination cleanly, the non-streaming path correctly relies on the framework's contextlib.suppress at the executor level, exceptions from the judge client and user predicates propagate to calers rather than being silently swallowed, and the session snapshot/restore pattern correctly rebuilds from the snapshot each time to avoid aliasing. No high-severity silent failure modes or data-loss paths were identified.

✗ Design Approach

The loop is close, but I found one blocking design bug in the non-streaming path: a downstream MiddlewareTermination can be silently turned into a duplicate iteration because the pipeline suppresses the exception and leaves the previous context.result in place. I also found one smaller design gap in the background_tasks_running helper: it reads persisted task state without using the provider’s own refresh path, so it can loop once more on stale RUNING status.

Flagged Issues

Non-streaming loop iterations can reuse a stale context.result after suppressed MiddlewareTermination instead of stopping cleanly (python/packages/core/agent_framework/_harness/_loop.py:436-444, python/packages/core/agent_framework/_middleware.py:921-931).

Suggestions

Refresh background task state through BackgroundAgentsProvider before checking for RUNING tasks, matching the provider's existing pattern (python/packages/core/agent_framework/_harness/_loop.py:767-775, python/packages/core/agent_framework/_harness/_background_agents.py:211-236, 408-428, 517-518).

Automated review by eavanvalkenburg's agents

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 78% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped agent in a loop. A single configurable class covers three common patterns, each with a convenience classmethod factory: - Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking (`record_feedback`/`progress`), progress injection (`inject_progress`), optional fresh context per iteration (`fresh_context`), and an early-stop completion signal (`is_complete`). - Predicate (`.with_predicate(...)`): loop while a `should_continue` callable returns True (e.g. paired with `todos_remaining`/`background_tasks_running`). - Judge (`.with_judge(...)`): a second chat client decides whether the original request was answered, using a `JudgeVerdict` structured-output response. The loop also auto-resolves pending function-approval / user-input requests via an `on_approval_request` callable (bounded by `max_approval_rounds`), and the next iteration's input is controlled by `next_message`. Supports both streaming and non-streaming runs. Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and `background_tasks_running`. Adds tests, a sample, and docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- with_judge: add criteria list with {{criteria}} templating into judge instructions plus an agent-side instruction; add fresh_context, additional judge feedback relay; default judge max_iterations. - should_continue is now required and positional; supports (bool, str|None) feedback tuples surfaced to next_message/record_feedback via feedback kwarg. - Judge forwards full multi-modal request and response messages. - Default max_iterations=10 (explicit None = unbounded); removed is_complete and Ralph terminology; ShouldContinueResult is a real TypeAlias. - Sample: stream all loops, print iteration counts via injected user-block boundaries (robust to function calling), <role>: content formatting, per-method expected output, and a looping todo sample. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Resolve pyright errors in _loop.py: drop the always-true final_result None check (the while loop always assigns it) and cast finish_reason to the AgentResponse constructor's expected type. - Apply pyupgrade --py310-plus: import TypeAlias from typing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

pyright infers AgentResponse.finish_reason as including str and rejects the direct assignment, while mypy considers a cast redundant. Drop the cast and suppress only pyright with a targeted reportArgumentType ignore, satisfying both type checkers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add a second AgentLoopMiddleware sample that composes two criteria in one should_continue predicate: a TodoProvider check (evaluated first) and a report-style judge chat client (evaluated once todos are complete) that grades the assembled report against shared requirements. Register it in the middleware samples README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written predicate. The inner todos_remaining loop drafts the report todo-by-todo and the outer with_judge loop re-runs it until an editor chat client judges the report publication-ready, reusing the built-in helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AgentLoopMiddleware.fresh_context previously only reset context.messages, so with an attached session each iteration still reloaded the local transcript or re-threaded the service-side conversation id and the model saw the accumulated history. Snapshot the session once before the loop (via to_dict) and restore it (from_dict + field copy) between iterations, so every pass starts from the pre-loop baseline. The final iteration's pass is persisted (no restore after the terminating iteration), so a subsequent agent.run continues from there. Removed the obsolete warning, updated docstrings and core AGENTS.md, and added tests: a snapshot/restore round-trip, a session-reset streaming x fresh_context x inject_progress x store matrix across multiple runs and loop iterations, and response_format parsing across the loop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 29, 2026 15:07

Copilot started reviewing on behalf of eavanvalkenburg May 29, 2026 15:08 View session

moonbox3 added documentation Improvements or additions to documentation python labels May 29, 2026

Copilot stopped reviewing on behalf of eavanvalkenburg due to an error May 29, 2026 15:08
An unexpected error occurred. For more details, see the detailed logs in GitHub Actions.

github-actions Bot reviewed May 29, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/_harness/_loop.py Outdated

Comment thread python/packages/core/agent_framework/_harness/_loop.py Outdated

eavanvalkenburg force-pushed the agent_loop branch from 8230a86 to 4dd8c6c Compare June 8, 2026 09:26

eavanvalkenburg marked this pull request as ready for review June 9, 2026 09:02

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

westey-m approved these changes Jun 10, 2026

View reviewed changes

TaoChenOSU reviewed Jun 10, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/_harness/_loop.py Outdated

github-actions Bot reviewed Jun 11, 2026

View reviewed changes

moonbox3 reviewed Jun 11, 2026

View reviewed changes

Comment thread python/samples/02-agents/middleware/agent_loop_middleware.py Outdated

Comment thread python/packages/core/agent_framework/_harness/_loop.py Outdated

eavanvalkenburg force-pushed the agent_loop branch from 5f07e95 to c0c1c77 Compare June 11, 2026 11:36

eavanvalkenburg and others added 8 commits June 12, 2026 08:37

Updated samples and docstrings

ee33936

eavanvalkenburg force-pushed the agent_loop branch from c0c1c77 to ee33936 Compare June 12, 2026 06:37

moonbox3 approved these changes Jun 12, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Jun 12, 2026

Merged via the queue into microsoft:main with commit 1acd242 Jun 12, 2026
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add AgentLoopMiddleware for re-running agents in a loop#6174

Python: Add AgentLoopMiddleware for re-running agents in a loop#6174
eavanvalkenburg merged 8 commits into
microsoft:mainfrom
eavanvalkenburg:agent_loop

eavanvalkenburg commented May 29, 2026 •

edited

Loading

Uh oh!

moonbox3 commented May 29, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eavanvalkenburg commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented May 29, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Failure Modes

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eavanvalkenburg commented May 29, 2026 •

edited

Loading

moonbox3 commented May 29, 2026 •

edited by github-actions Bot

Loading