Skip to content

scheduler: only schedule jobs on state transitions#3121

Open
nuclearcat wants to merge 1 commit into
kernelci:mainfrom
nuclearcat:dup-fix
Open

scheduler: only schedule jobs on state transitions#3121
nuclearcat wants to merge 1 commit into
kernelci:mainfrom
nuclearcat:dup-fix

Conversation

@nuclearcat

Copy link
Copy Markdown
Member

get_configs() matched a scheduler entry whenever the current event values satisfied it (level-triggered). Because a node event is emitted on every update, any update that left the node in an already-matching state (an artifact, timeout or flag change on an "available" node) re-triggered creation of the whole set of child jobs.

A 6-month audit of production confirmed this happens routinely: across 26 sampled days (236,139 job nodes) there were 4,380 duplicate job groups -- identical parent/name/runtime/platform with the same retry_counter, created seconds apart -- with the rate rising sharply from late April 2026, reaching ~10% of jobs on some days.

Make scheduling edge-triggered: fire only on the transition into the matched condition, using previous_state/previous_result now carried in the event. Falls back to the previous level-triggered behaviour when that information is absent (node creation, retry events, older API), so retries and freshly created nodes are unaffected.

Fixes #2912

get_configs() matched a scheduler entry whenever the current event
values satisfied it (level-triggered). Because a node event is emitted
on every update, any update that left the node in an already-matching
state (an artifact, timeout or flag change on an "available" node)
re-triggered creation of the whole set of child jobs.

A 6-month audit of production confirmed this happens routinely: across
26 sampled days (236,139 job nodes) there were 4,380 duplicate job
groups -- identical parent/name/runtime/platform with the same
retry_counter, created seconds apart -- with the rate rising sharply
from late April 2026, reaching ~10% of jobs on some days.

Make scheduling edge-triggered: fire only on the transition into the
matched condition, using previous_state/previous_result now carried in
the event. Falls back to the previous level-triggered behaviour when
that information is absent (node creation, retry events, older API), so
retries and freshly created nodes are unaffected.

Fixes kernelci#2912

Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
@nuclearcat nuclearcat marked this pull request as ready for review June 14, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid scheduling multiple identical jobs

1 participant