Skip to content

Fix unchanged-files cache missing changes induced by changed dependencies#8028

Open
SanderMuller wants to merge 1 commit into
rectorphp:mainfrom
SanderMuller:perf/autoresearch
Open

Fix unchanged-files cache missing changes induced by changed dependencies#8028
SanderMuller wants to merge 1 commit into
rectorphp:mainfrom
SanderMuller:perf/autoresearch

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Problem

The unchanged-files cache only checks each file's own content. A clean file stays skipped on warm runs even when one of its dependencies changed.

Repro on main: a parent class method gains : int, which lets a child file infer its own return type. A fresh run reports the new change in the child file, a warm run reports nothing until --clear-cache.

// Dad.php — the only file that changes on disk
 class Dad
 {
-    public function getValue()
+    public function getValue(): int
     {
         return 1;
     }
 }

// Kid.php — unchanged on disk, but typeDeclarations can now add `: int` to value()
class Kid extends Dad
{
    public function value()
    {
        return $this->getValue();
    }
}

The warm run reprocesses Dad.php (own hash changed) but skips Kid.php (own hash unchanged), so the new Kid::value(): int suggestion never appears. Reproducible with vanilla rector: cache a clean state, apply the Dad.php edit, run again — the Kid.php change only shows up after --clear-cache.

Fix

PHPStanNodeScopeResolver records which files each file depends on during scope resolution, using PHPStan's own DependencyResolver (the engine behind PHPStan's result cache). The set is transitive: an inheritance chain records the grandparent.

Cache entries store the file's own hash plus one hash per dependency, all re-validated on load. Old string entries still load and upgrade themselves on the next write, so existing caches keep working. If capture fails for a file, the file is not cached at all rather than cached with a partial set.

To keep capture cheap, only node types whose resolver branch can actually produce dependencies get resolved. The biggest skip: native function calls, whose first resolution makes PHPStan load its entire function signature map (NativeFunctionCallAnalyzer, with a mutation-verified matrix test over every name form). Two tripwire tests pin the assumptions that live outside this repo: one parses the bundled DependencyResolver and fails if its branch set changes on a PHPStan bump, the other asserts native function reflections stay fileless.

Cost: ~6% on a cold run of a 4,400-file project; warm runs unchanged within noise. Output stays byte-identical to a fresh run.

Gist

One capture call per dependency-relevant node during scope resolution rector already does:

// PHPStanNodeScopeResolver, inside the node callback:
$nodeDependencies = $this->dependencyResolver->resolveDependencies($node, $mutatingScope);
foreach ($nodeDependencies->getFileDependencies() as $dependencyFile) {
    $this->fileDependencyCollector->record($filePath, $dependencyFile);
}

And the cache entry goes from a bare content hash to the hash plus the captured dependency hashes, all re-validated on load:

// before
'abc123'
// after
['hash' => 'abc123', 'deps' => ['/src/Dad.php' => 'def456']]

Scope

Slimmed down from the original PR as discussed — this now contains only the correctness fix. Split out / follow-ups:

Known limitation, same gap the existing cache already has: a newly added file that changes name resolution for an unchanged file is not detected, since no dependency edge existed yet. --clear-cache covers it, same contract as today.

Comment thread src/NodeAnalyzer/NativeFunctionCallAnalyzer.php
return false;
}

$this->internalFunctionNames ??= array_flip(get_defined_functions()['internal']);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsonasik Do you mean this line then? Would static reflection be faster?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this AI generated get_defined_functions() caching usage on StructArmed, it slow, and use of namespaced_name is more than enough there.

If function check exists needed, there is already phpstan reflection function NativeFunctionReflection for that, so way more reliable per static analysis process instead of runtime php.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is built once per process and memoized, not per call - measured in rector's container: ~0.05ms to build, 10k checks ≈ 1.4ms. Calling get_defined_functions() per node would indeed be slow; this does not.

The namespaced-name comparison answers a different question: "does this resolve to the global namespace", while this guard needs "can this call have a file dependency". Userland functions in the global namespace - app()/config() in Laravel, any composer files-autoloaded helpers - pass that comparison but live in files the cache must track; skipping capture for them would reintroduce the stale-cache bug this PR fixes. The matrix test pins that global userland functions are not treated as native.

NativeFunctionReflection is only known after resolving through the ReflectionProvider (first getFunction() + variants ≈ 24ms, signature map) - the work this guard sits in front of. And the runtime list errs only in the safe direction: a name not in it simply is not skipped, capture proceeds and PHPStan stays authoritative.

The cache only checked each file's own content, so a clean file stayed
skipped on warm runs even when one of its dependencies changed, e.g. a
parent class method gaining a return type that lets a child file infer
its own. A fresh run reports the new change, a warm run misses it.

PHPStanNodeScopeResolver now records each file's dependencies during
scope resolution using PHPStan's own DependencyResolver. Cache entries
store the file's own hash plus one hash per dependency, all re-validated
on load; legacy string entries self-upgrade on the next write. A failed
capture skips caching entirely rather than caching a partial set.

To keep capture cheap, only nodes whose resolver branch can produce
dependencies are resolved, and native function calls are skipped via
NativeFunctionCallAnalyzer, avoiding PHPStan's function signature map
load. Two tripwire tests pin the external assumptions: one parses the
bundled DependencyResolver and fails if its branch set changes on a
PHPStan bump, the other asserts native function reflections stay
fileless.

Selective runs (--only, --only-suffix) bypass the cache write, same
guard as rectorphp#8029.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@SanderMuller SanderMuller changed the title perf: dependency-aware caching, fixes stale cache results and speeds up warm dry-runs 3.5-9x Fix unchanged-files cache missing changes induced by changed dependencies Jun 10, 2026
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Slimmed this down as discussed: it now contains only the dependency-aware unchanged-files cache fix — 626 lines, half of which is tests. The --only cache bug is split out as #8029 (standalone, ~30 lines of src + an e2e). The two perf pieces (dry-run diff replay, long-lived dependency-set store) are ready as follow-up PRs and will open one at a time after this merges.

@samsonasik your NativeFunctionCallAnalyzer comment still applies here — answered in the thread above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants