Skip to content

Multithreading for KeY and now it's deterministic too#3842

Open
unp1 wants to merge 20 commits into
mainfrom
bubel/mt-goals
Open

Multithreading for KeY and now it's deterministic too#3842
unp1 wants to merge 20 commits into
mainfrom
bubel/mt-goals

Conversation

@unp1

@unp1 unp1 commented Jun 18, 2026

Copy link
Copy Markdown
Member

Intended Change

This pull request adds a multi-core (parallel) prover to KeY: automatic proof search runs on a worker pool. An open goal is either not yet assigned to a worker or it has exactly one worker, proving independent sibling goals concurrently. There are at most as many workers as configured and the number of workers is capped at the number of processor cores. It is a port and modernisation of the 2018 HacKeYthon multithreading prototype, rebuilt on current main.

The legacy single-threaded prover stays the default and the safe fallback; the multi-core prover is opt-in per the configuration below.

Proof search is deterministic: the multi-core prover constructs the identical proof, term for term, on every run and at every worker count (see Deterministic proof search below).

Default in this PR: to make the feature easy to try, this branch ships with
the multi-core prover enabled at 4 threads. When this PR is accepted for
main, the default will be reverted to single-core
so that the
single-core-only features (proof caching, slicing, merge rule) are active as
before. This is a one-line change to the setting default.

Architecture (conceptual)

The guiding principle is isolation, not locking. Goal.apply is split into a concurrent compute phase (rule selection + matching, the expensive part) and a commit phase (proof-tree mutation) serialized under a single lock. A single-monitor GoalScheduler hands goals to workers depth-first. Non-essential listeners are detached for the run, fresh names come from a per-worker partition of the name allocator, shared matching/cost caches use thread-safe cache utilities, and rules not yet concurrency-safe (the merge rule) disable themselves during a multi-worker run. Only the standard Java profile runs in parallel.

Full conceptual write-up: Multi-Core Proving (key-docs).

Deterministic proof search

TL;DR. Proof search is fully deterministic on the multi-core prover. Across the entire
regression suite (648 proofs), two independent 4-worker runs produce structurally identical
proofs (same rule applications, nodes, branches), and those proofs are structurally identical
to the single-core prover's as well. On the variance set the run-to-run node-count spread is
zero at every worker count, and sequential vs. 1-worker proofs are term-identical (fresh names
included). Getting there removed the last scheduling-dependent inputs to rule selection and moved
fresh-name generation off global counters onto the goal-local proof state (Closes #3851).

A headline property of this PR: proof search is reproducible — also on the multi-core prover. A series of commits (c3f619862db939a7ff58) eliminated every identified source of goal-scheduling-dependent rule selection: a polarity-sharing term-index cache, cache-dependent candidate reporting, an incomplete equal-cost tie-break, and a term-label-sensitive duplicate-application veto — several of them latent on main as well.

Run-to-run node-count spread over 10 repetitions (MtVarianceBenchmark):

example 8 workers, before 4 and 8 workers, now
SimplifiedLinkedList.remove 6752 nodes (11.6 %) 0 — bit-stable
symmArray 469 (3.0 %) 0 — bit-stable
Saddleback_search 88 (0.28 %) 0
gemplusDecimal/add 31 (0.11 %) 0
Simple__generateByteArray 30 (1.6 %) 0
Simple__square 0, but ≠ sequential proof 0, = 1-worker proof

Sequential and 1-worker proofs are term-identical — the same rule at the same position producing the same sequent, fresh names included — at every node, verified by a full tree walk over the example set (ParallelProverDeterminismTest). Proofs are also worker-count-independent: e.g. SimplifiedLinkedList.remove constructs the identical 58987-node proof at 1, 4 and 8 workers, and its run-to-run node-count spread at 8 workers is zero over repeated runs. Two independent sources of scheduling dependence were removed to reach this: the cost-reuse hardening series (merged here) retired the last JVM-global caches that made proofs depend on what ran before them in the same process, and fresh names were moved off global counters (below).

Fresh names are derived from the goal-local branch state, not from global counters (addresses #3851). Previously a per-worker allocator gave skolems worker-dependent tags and several #-indexed program-variable and label names were drawn from proof-global counters advanced as a side effect of unrelated mints — so two otherwise identical proofs (across workers, or across a reload / prune-and-redo) could disagree on names. Every fresh name on the proving path now comes from a smallest-free-index search against the goal-local namespaces: a pure function of the branch, identical no matter which worker (or how many) runs the goal. This was the last input the equal-cost tie-break read that depended on scheduling; with it gone, symmArray — the one proof with a residual run-to-run wobble — is bit-stable. Saved-proof compatibility is preserved: recorded name proposals still win over regeneration, and the added-rule name format (which replay resolves by regeneration) is unchanged. A dedicated naming harness pins the invariants (branch-locality, prune-reset, added-rule locality, addProgramVariable soundness, replay compatibility). This closes the naming brittleness that motivated #3851: Closes #3851. (Three global name counters remain, all off the automode proving path and none affecting determinism: the GenCnt counter used only by the \obtain proof-script command, the init-time classAxiom taclet-name counter, and the Symbolic-Execution / information-flow term-label id counters. They can be migrated with the same pattern as a follow-up if desired.)

Problems encountered, fixed and verified

  • Lost goals / non-closure: sibling goals shared a mutable taclet-index cache key and raced on it → a dropped goal and a nondeterministic non-closure. Fixed with an immutable per-lookup key. Reproduced in ~50% of 8-worker runs before, 0 after; guarded by a CI stress test.
  • Early termination: completing a goal and offering its successors is now a single atomic step under the scheduler monitor.
  • Shared-state hazards: a systematic audit hardened the remaining share proving state; each is a separate, behaviour-preserving commit.
  • Off-EDT view updates: per-goal listeners (attached to each Goal, outside the proof-level suspension) fired on worker threads and touched Swing off the event-dispatch thread. The suspension now covers them too, and the views refresh once from the final proof state after a run.

Single-core-only features

Proof caching, proof slicing and the merge rule are switched off while the multi-core prover is active (GUI greys them out with a tooltip; the status line shows the mode) and restored when switching back. Loading a proof script and running macros use the multi-core prover when active and the profile supports it, with one deliberate exception: TryClose (and the Full Auto pilot that ends in it) closes goals one at a time under a tight per-goal step budget and is pinned to the single-core prover, a single goal offers no parallelism, and several workers exploring its subtree apply rules less step-efficiently and can exhaust the budget before the goal closes. Either way a run returns only once all workers stop, so pruning sees a quiescent proof. These restrictions will be lifted incrementally with the feature owners once the subset is confirmed safe. Because the default is single-core, these features work exactly as before.

Configuration

  • GUI: Settings → Prover (Single / Multi-Core) + a clickable SC / MT N×
    status-line indicator.
  • CLI: --threads N.
  • Tests: pinned single-core; -Dkey.prover.parallel[.threads] overrides.

Benchmarks

Measured on a 16-core machine, median of three runs per proof (run-to-run wall-clock noise ≈2 %), each proof loaded in a clean settings state. seq (s) is the single-threaded wall-clock time; the 2× / 4× / 8× columns give the speedup at that worker count over it. The multi-core prover does not change the single-threaded path, and the deterministic proof search (above) is per-node roughly cost-neutral, a bounded, proof-dependent constant factor (0–8 %) that does not grow with proof size (verified to ~130 k nodes).

Real-world proofs — automode (speedup by worker count).

Proof seq (s)
SimplifiedLinkedList.remove 20.9 1.75× 2.35× 2.35×
gemplusDecimal/add 8.8 1.80× 2.90× 3.67×
Saddleback_search 12.9 1.62× 1.80× 1.07×
symmArray 14.6 1.62× 2.19× 2.42×
ArrayList.remove.1 2.4 1.57× 1.79× 1.83×
ArrayList_concatenate 8.2 2.27× 2.97× 3.86×
arith/median 2.7 1.71× 2.38× 3.30×
ArrayList.remFirst 0.6 1.16× 1.52× 1.49×

Wide, splitting proofs scale furthestArrayList_concatenate reaches 3.86× at 8 workers, add 3.67×, median 3.30×. Moderately wide proofs plateau around 4 workers (~2–2.4×). Narrow proofs gain little and can regress at 8× (see the shape table). (quicksort/sort is omitted — it needs its proof script to close, so it is not a
pure-automode entry.)

Proof Shape Why this speedup
ArrayList_concatenate, add, median wide / splitting many independent sibling goals → scale furthest, ~3.3–3.9× and still improving at 8 workers
symmArray, SLL.remove, ArrayList.remove.1 moderately wide steady scaling, plateauing ~1.8–2.4× by 4–8 workers
Saddleback_search narrow (~3-goal frontier) limited parallel width; ~1.8× at 4×, regresses to 1.07× at 8× — extra workers do speculative work off the critical path with no width to amortise it
ArrayList.remFirst tiny (~2k nodes) too small to parallelise; ~1.1–1.4×

Diagrams. Rendered speedup bar charts are in the dev doc's Measured speedup section.

Plan

  • Parallel engine, scheduler, isolation mechanisms
  • GUI / CLI integration, feature gating
  • Proof-macro support on the multi-core prover
  • Thread-safety audit of shared proving state
  • Equivalence gate + 8-worker stress tests (CI)
  • spotless / NULLNESS / unit tests green (symb_exec & proof_ref excluded, known-broken)
  • Fill in benchmark tables (diagrams generated, to attach)
  • Switch the default back to single-core before merge

Type of pull request

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue) — see the latent-bug commits below
  • There are changes to the (Java) code
  • There are changes to the deployment/CI infrastructure (gradle, github, ...)

Ensuring quality

  • Introduced/changed code is documented (javadoc + inline comments).
  • End-user features are documented (key-docs: Multi-Core Proving).
  • Added equivalence, stress and unit tests for the new functionality.
  • Tested: the equivalence gate (single vs parallel outcome) and 8-worker stress tests pass; full key.core + touched-module unit tests pass.
  • Runtime performance: the single-core path is unchanged (the parallel prover is opt-in); see the benchmark section for multi-core speedups.

Separable, main-relevant fixes (kept as standalone commits for cherry-picking, useful even single-core):

  • Compute NodeInfo's active statement lazily, off the proving path
  • Cache loop-invariant instantiation on the rule app, not a static field (latent cross-proof leak via a static field)
  • Make per-call program transformers stateless (MethodCall, JmlAssert)
  • Robustly read numeric settings stored as Integer or Long — fixes a startup ClassCastException (present on main) that can prevent KeY from launching when loading certain settings files (Cherry-Picked on Fixes for minor regressions on main #3844)
  • Don't crash reloading a recent file with no stored profile — a recent-file entry without a profile serialized null as the string "null", so reloading (Cherry-Picked on Fixes for minor regressions on main #3844) it failed to resolve a profile and showed an error instead of opening the file
  • Show a proof macro's aggregate statistics in the status line — the status line kept a macro's last internal-pass count instead of its aggregate result (Cherry-Picked on Fixes for minor regressions on main #3844)

Additional information and contact(s)

This PR provides a clear opt-in benefit (parallel speedup on splitting proofs)
with a safe fallback to the unchanged single-core prover.

This PR has been done with AI tooling support.

The contributions within this pull request are licensed under GPLv2 (only) for inclusion in KeY.

@unp1 unp1 force-pushed the bubel/mt-goals branch from 1518256 to cd9f12e Compare June 18, 2026 21:32
@unp1 unp1 changed the title Multithreading: a multi-core prover for automatic proof search Multithreading for KeY Jun 18, 2026
@unp1 unp1 self-assigned this Jun 18, 2026
@unp1 unp1 force-pushed the bubel/mt-goals branch from cd9f12e to 5fc660b Compare June 18, 2026 22:29
@unp1 unp1 marked this pull request as ready for review June 18, 2026 23:15
@unp1 unp1 force-pushed the bubel/mt-goals branch from 5fc660b to 7cf6734 Compare June 19, 2026 04:26
@wadoon

wadoon commented Jun 19, 2026

Copy link
Copy Markdown
Member

one thread per open goal

  1. So you are creating 1000 threads if you have 1000 open goals?
    Or do you mean, you create 1000 callable instances, ...

The idea of a worker pool is to have a fix/max amount of threads, and to dispatch work on them.

  1. Are you using/Consider to use the new virtual threads. Somebody would also call them greenlets, or stackless...

@unp1

unp1 commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

one thread per open goal

  1. So you are creating 1000 threads if you have 1000 open goals?
    Or do you mean, you create 1000 callable instances, ...

no that is just the starting idea. Depending on the amounts of threads you configure to use, you have that nr of threads and it is capped at the number of processor cores of your computer. Then there is a worker pool and a lifo queue from which a free worker is assigned the next goal. I'll clarify the description thanks.

The idea of a worker pool is to have a fix/max amount of threads, and to dispatch work on them.

  1. Are you using/Consider to use the new virtual threads. Somebody would also call them greenlets, or stackless...

not yet. The first milestone was to get to a point where we have for some problems a win and KeY is (at least as far as one can test) race free. The problem is, I did not have any race conditions even under high pressure. But the problem is that other OSes with other schedulers might discover some still existing problems, so if someone has time and can just run it once in a while to get feedback, would be appreciated.

For trying out, I recommend the preview integration with the single core improvemnts. It solves some congestions here already.

@wadoon

wadoon commented Jun 19, 2026

Copy link
Copy Markdown
Member

Could you provide the command to be executed? I could try to get a 64-core machine if we would have proofs having such number of goals.

@unp1 unp1 force-pushed the bubel/mt-goals branch from 890f466 to b7bd241 Compare June 25, 2026 06:09
@unp1 unp1 force-pushed the bubel/mt-goals branch 4 times, most recently from 3141113 to 9d8d91c Compare July 1, 2026 20:06
@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Status: proof-search determinism (goal-order independence)

The last commits (c3f619862d..a234eddf50) close the main sources of goal-scheduling-dependent proof search that made parallel proofs vary run-to-run:

  • c3f619862d — the shared term-index cache conflated positions of different polarity below the top level; the polarity-restricted rewrite taclets made cached app sets first-writer-dependent (missing/spurious candidates depending on which goal indexed a term first).
  • 5a3ebb8e10 — index-cache hits reported a different (larger) app set than the incremental rebuild; reporting is now cache-state-independent.
  • 59949f59ca, a234eddf50 — the equal-cost tie-break was not a consistent total order: assumes instantiations were not compared, the position of taclet apps was never compared (NoPosTacletApp.posInOccurrence() is null — the position lives in the container), and the SV-map walk depended on map build order (a comparator asymmetry that corrupts the whole queue ordering).
  • eafcf6693b — regression test: sequential and 1-worker-parallel proofs must be node-identical on a set of examples.

Measured effect (variance benchmark, 10 reps):

before after
seq vs 1-worker 7/13 examples differ (up to ±4500 nodes) 9/13 node-identical, rest ≤ ±23
run-to-run @ 8 workers, SLL.remove 11.6 % spread 0.25 %
run-to-run @ 8 workers, symmArray 3.0 % 0.09 %
run-to-run @ 8 workers, Simple__square / generateByteArray 1-2 % bit-stable (spread 0)

Note for RAP: like any change to rule-selection order this changes the shape of found proofs (some grow slightly, some shrink — e.g. symmArray −12 %). heap/quicksort/sort.script needed a larger tryclose budget for its side goals (included); the three quicksort entries were verified locally, statistics-sensitive entries may need refreshing if CI flags them.

@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Update after b8abdb8935 + d229c303c4:

  • b8abdb8935 closes one more tie-break gap (containers for different occurrences of structurally identical subterms share their NoPosTacletApp, so the identity shortcut bypassed the position comparison). 12 of the 13 variance-set examples are now proved node-identically by the sequential and the 1-worker parallel prover; run-to-run at 8 workers (10 reps): add (27k nodes), Saddleback_search (31k), Simple__square, Simple__generateByteArray are bit-stable (spread 0), symmArray 0.08 %, SLL.remove 0.23 % (a clean two-state flip — exactly two proof shapes — i.e. one remaining discrete decision, under investigation).
  • d229c303c4 fixes a pre-existing test-isolation issue (MtStressTest/MtScriptStressTest leaked their examples' proof settings into ProofSettings.DEFAULT_SETTINGS, which could stall unrelated proofs when suites share a JVM).

@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Determinism: the last run-to-run flip is fixed (0f8645b765, b939a7ff58)

Root cause of the final coin flip: the NonDuplicateApp veto was term-label-sensitive in three places (bucket fingerprint, update-context comparison, instantiation comparison) although its own equality notion explicitly ignores labels. The same logical candidate surfaces with different labels depending on whether a fresh taclet match or another goal's term-index-cache entry reports it — so whether the veto fired depended on cross-goal cache timing. Details in b939a7ff58; as a side effect the veto is now strictly more effective (previously escaped duplicate applications are suppressed).

Run-to-run node-count spread, 10 repetitions (MtVarianceBenchmark):

example 8 workers, before the series 4 and 8 workers, now
SimplifiedLinkedList.remove 6752 nodes (11.6 %) 0 — bit-stable
symmArray 469 (3.0 %) 11 (0.08 %)
Saddleback_search 88 (0.28 %) 0
gemplusDecimal/add 31 (0.11 %) 0
Simple__generateByteArray 30 (1.6 %) 0
Simple__square 0, but ≠ sequential proof 0, = 1-worker proof

Sequential agreement: 12 of 13 variance-set examples are proved node-identically by the sequential and the 1-worker parallel prover (SimplifiedLinkedList.remove Δ4 nodes, down from Δ≈4500 before the series). 14 consecutive 8-worker runs of SimplifiedLinkedList.remove produce the identical proof.

Comparison with current main (bf88d529c4), SimplifiedLinkedList.remove, sequential prover (3 repetitions × 3 separate JVMs each):

main this PR
repetitions within one JVM identical (55300 nodes) identical
across separate JVMs identical (full rule/position listing byte-equal) identical
with the parallel prover — (not available) identical at 1, 4 and 8 workers

So main's sequential prover is already reproducible on this example; the PR carries that reproducibility over to the multi-core prover — the surface this PR adds — and closes several determinism defects that are latent on main as well (polarity-shared index-cache entries, incomplete cost tie-break, label-sensitive duplicate veto).

Known footnotes: proof sizes shift against main (rule-selection order changed; SLL 58987 vs 55300 on a fresh JVM), and on this branch the first proof of a JVM can differ — deterministically — from later ones through a static cache in the cost-reuse layer; the queued port of the costreuse-stable-classification series removes those statics.

@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Correction to the numbers above: it is 13 of 13 — the reported 4-node difference on SimplifiedLinkedList.remove was an artifact of the variance benchmark's warm-up run (a parallel warm-up leaves state in a static cost-reuse cache that deterministically shifts later proofs in the same JVM; the sequential and 1-worker configurations sat at different positions in that sequence). Measured cleanly — one sequential and one 1-worker proof, full tree walk — the two proofs are identical at every node. The benchmark hardening (per-configuration JVM isolation) and the removal of the static caches (port of the costreuse-stable-classification series) are queued follow-ups.

@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Follow-ups landed: the costreuse-stable-classification series is merged (4846538ece) — the static-cache footnote from the earlier comment no longer applies (proofs are independent of what ran before them in the same JVM), and as a bonus SimplifiedLinkedList.remove is now worker-count-independent (identical 58987-node proof at 0, 1 and 8 workers). The remaining symmArray 0.08 % run-to-run wobble is root-caused to per-worker fresh-name indices ordering equal-cost decisions differently — i.e. exactly #3851; state-derived naming is the designated fix and is out of scope for this PR.

@unp1

unp1 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

The last MT wobble is gone — fresh names are now branch-state-derived (#3851, multi-core part).

The per-worker name allocator (__t<w> tags) is retired: fresh-name minting under the multi-core prover now takes exactly the single-threaded route — a smallest-free-index search against the goal-local namespaces (#3848), i.e. a name is a pure function of the branch state, independent of worker scheduling. Sibling branches deliberately reuse names, exactly as in single-threaded proving; the shared namespace stays immutable and untouched.

Result: symmArray, the one remaining run-to-run wobble (±0.08 %), is now bit-stable over 15 consecutive 8-worker runs and closes at the sequential node count. Every example in the variance set is now fully deterministic at every worker count.

Ships with a naming regression harness (TestNamingInvariants): branch-locality + prune-reset of skolem names, goal-local insert_hidden taclets with state-derived names, addProgramVariable locality, NameRecorder priority, a now-permanent seq==8-worker name-equality test, and a frozen saved proof pinning the added-rule name format that replay resolves by regeneration (any format change without a replayer conversion goes red).

Saved-proof compatibility unaffected: the touched channels replay through the NameRecorder; the added-rule format is unchanged. Suites green incl. proveRules 203/203.

@unp1 unp1 changed the title Multithreading for KeY Multithreading for KeY and it's deterministic too Jul 2, 2026
@unp1 unp1 force-pushed the bubel/mt-goals branch from d1e8c38 to b4aeed0 Compare July 2, 2026 22:34
@unp1 unp1 changed the title Multithreading for KeY and it's deterministic too Multithreading for KeY and now it's deterministic too Jul 3, 2026
@unp1 unp1 force-pushed the bubel/mt-goals branch from 6c26131 to a35576a Compare July 3, 2026 11:44
unp1 added 7 commits July 3, 2026 22:31
Compute NodeInfo's active statement lazily, cache the loop-invariant instantiation on the rule app instead of a static field, and make the per-call program transformers stateless, so the proving path carries no shared mutable state before the parallel prover is introduced.
ConcurrentLruCache (exact-LRU, single-lock, drop-in for synchronizedMap over an
LRUCache) for eviction-sensitive caches, StripedLruCache for pure caches, and
WeakValueInterner for identity-preserving interning under concurrency, with
concurrency tests.
Add the parallelProverEnabled / parallelProverThreadCount settings and a
Profile.supportsParallelAutomode() capability: the standard Java profile opts
in, the well-definedness, information-flow and symbolic-execution profiles keep
the single-core fallback.
Add EssentialProofListener and Proof.suspendNonEssentialListeners(), which
detaches every proof-tree and rule-app listener not tagged essential for the
duration of a run and re-attaches them afterwards, so nothing unrelated to
proving fires on a worker thread.
ParallelProver runs automatic proof search on a worker pool, one open goal per
worker, behind the AutoProvers selection seam. A single GoalScheduler monitor
hands out goals depth-first; Goal.apply is split into a concurrent compute phase
and a commit phase serialized on one lock; fresh names come from a partitioned
per-proof allocator (atomic Counter), namespace flushes are deferred, and the
merge rule is disabled during parallel runs. The taclet-index cache key is
immutable so siblings sharing the cache set cannot race on it. Includes the
scheduler unit test and the proof-equivalence gate.
Harden the shared state the workers touch during search: route the matching and
cost-path caches through the thread-safe cache utilities, intern parametric
operators and elementary updates identity-preservingly, give the per-call
instantiation caches and counters safe publication or per-worker copies, and
guard the specification repository's proving-time maps. Behaviour-preserving.
Add a settings pane and a clickable status-line indicator to switch between the
single-core and multi-core prover, a --threads CLI option, and gating of the
single-core-only features: proof caching, slicing and the merge rule are
disabled (with explanatory tooltips, updated live) while the multi-core prover
is active, and restored when switching back.
@unp1 unp1 force-pushed the bubel/mt-goals branch 2 times, most recently from 4677232 to 6ab478c Compare July 4, 2026 07:45
unp1 added 10 commits July 4, 2026 10:50
Replace the hardcoded ApplyStrategy in the strategy-running proof macros and the script auto command with the AutoProvers seam, so macro and script automation honour the selected prover mode. The toolbar's Full Automation action (DefaultAutoMacro) and the proof-script auto command are included -- both are plain automode and must follow the mode setting. TryCloseMacro deliberately stays on the single-threaded prover: it closes one goal at a time under a tight per-goal step budget, where several workers exploring a single goal's subtree apply rules in a less step-efficient order that can exhaust the budget.
Add the parallel-prover speedup/variance benchmarks and the stress tests, with CI wiring: a generous MtStressTest step cap so benign parallel divergence cannot fail it, a corrected MtSpeedupBenchmark sequential baseline (it was secretly parallel), and MtVarianceBenchmark.
Pool the matching cursor per thread instead of behind a global lock, stop the multi-core workers before restoring proof listeners on cancel, and make the SpecificationRepository spec maps thread-safe for the parallel prover.
Pre-create the default execution context once and stop re-registering it, and fail fast on Java type registration during a parallel proof run instead of racing on lazy registration.
Make LexPathOrdering's caches, ModalityCache, the quantifier-heuristic single-slot caches and the block/loop-contract instantiation memo thread-safe/thread-local, split the shared term-index cache by position polarity and report taclet apps cache-independently on formula modification, so cache state cannot change proofs across worker schedules.
Branch labels are a display concern -- nothing in the automatic proof
search reads them. The sole apparent reader, QueryExpandCost's
isStepCaseBranch (which classified step-case subtrees by substring-
matching the label text), was dead code and is removed, so no strategy
code depends on label text.

Yet labels were materialised eagerly during rule application:
NodeInfo.setBranchLabel ran the #sv schema-variable substitution +
ProofSaver.printAnything term pretty-printing, and the contract/observer
rules pre-formatted terms into the string. That work sat on the proving
path (needlessly lengthening it under the parallel prover) purely to
build strings only the GUI/save ever read.

Defer it: NodeInfo stores the raw label (or a Supplier, for labels that
stringify terms) and resolves it lazily and memoized on the first
getBranchLabel() -- i.e. on display/save. The three "Null reference
(term = null)" sites in UseOperationContractRule / ObserverToUpdateRule
capture the term reference and defer the toString via the new Supplier
overload.

Verified: taclet #sv, constant and lazy-Supplier labels all resolve to
the same content, the Supplier is evaluated once (memoized), and the
sampled proofs still close.
Keep the synthetic __Default__ type out of type enumerations so it is not lazily registered during matching under the parallel prover, with a regression test.
Break equal-cost ties by content -- structural position first, then rule, then instantiations and assumes instantiations, comparing container positions before the rule-app identity shortcut, so the rule-app queue is a consistent total order. Position-first mirrors the historical generation-order surfacing of equal-cost candidates and keeps proof sizes unchanged (rule-first ordering doubled the TimSort.binarySort case study by splitting too early). Derive trigger metavariable names from the quantified formula so they do not depend on creation order, and restore the global proof settings in the MT stress tests so a leaked setting cannot skew measurements. Adds a sequential-vs-one-worker determinism regression test.
Fingerprint the duplicate-application veto by shallow, label-free structure, so history-dependent origin labels cannot let a duplicate escape the veto nondeterministically.
Derive fresh names and temporary program-element names/labels from the goal-local branch state on every worker count, add a regression harness, pin saved-proof replay compatibility for added-rule names, and code the term-level proof-identity acceptance criterion (gated).
@unp1 unp1 force-pushed the bubel/mt-goals branch from 6ab478c to 987a2f6 Compare July 4, 2026 08:51
unp1 added 3 commits July 4, 2026 12:10
Give each RunAllProofs test fork its own statistics file, stop the proof search when an essential proof listener fails, default to the single-threaded prover with CI coverage for the parallel one, remove the non-thread-safe LRUCache class, and keep the MT RunAllProofs smoke tests out of the unit-test job.
Derive \obtain's fresh name from the current namespace, move the node serial counter onto Proof, number class-axiom taclets by content (new reusable ContentOrderNumbering), and deprecate Services.getCounter, so names and ids come from the goal-local proof state rather than a proof-global counter (#3851).
Always write RunAllProofs statistics per process (dropping the sysprop switch), and run the parallel-prover benchmarks and probes as named Gradle tasks with -P options instead of a pile of key.mt.* system properties.
@unp1 unp1 force-pushed the bubel/mt-goals branch from 987a2f6 to 16fea2b Compare July 4, 2026 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactoring: Move naming of variables etc. from global counter to proof state

3 participants