Skip to content

Skip live PR/MR provider detection during bulk import#263

Merged
alexeyzimarev merged 4 commits into
mainfrom
worktree-issue-232-import-provider-detection-latency
Jul 5, 2026
Merged

Skip live PR/MR provider detection during bulk import#263
alexeyzimarev merged 4 commits into
mainfrom
worktree-issue-232-import-provider-detection-latency

Conversation

@realtonyyoung

Copy link
Copy Markdown
Contributor

What & why

Follow-up to #229. ImportCommand's bulk loop calls RepositoryDetection.DetectRepositoryAsync per session/cwd, which runs the live gh pr view / glab api provider round-trip (up to ~2s best-effort each). The per-host memo in GitProviderRouter stops the gh auth status probe from multiplying, but the detection call still runs per cwd.

Finding (stronger than the issue's "measure first"): every import path discards the PR fields, so that round-trip is provably dead work, not just possibly-slow:

  • Scope-picker resolution and the "Scanning repositories" step use only Owner/RepoName.
  • Per-session ingest builds its repository node manually with user_name/owner/repo_name/branch/remote_urlno pr_* fields.
  • Classification (exclusion key) and the import-source detectors only need owner/repo.

The live/hook path (BuildRepositoryNode via EnrichWithRepositoryInfo) does emit pr_number/pr_title/… — so PR detection matters there, just not in import.

How

  • DetectRepositoryAsync gains detectPullRequest = true (default keeps every live/hook/enrich caller unchanged) and an injectable CommandRunner? run = null (defaults to the real process spawner) so the git + provider spawns are unit-testable.
  • All import-path callers pass detectPullRequest: false: ImportCommand (scope resolution, repo scan, per-session ingest), TranscriptFileClassification, and the Copilot/Pi/Kiro/Cursor import-source default detectors. Base repo info still resolves from git alone.

This is behaviour-preserving — imported sessions never carried PR info, so nothing in the imported payload changes; only the discarded network round-trips are removed. It fully realizes the issue's second remedy ("skip live provider detection during bulk import"), which is cleaner than caching since the result was unused.

Testing

  • New ImportProviderDetectionTests (injected recording runner): the import path (detectPullRequest: false) resolves owner/repo from git and spawns no gh/glab; the default (live) path still runs gh pr view and populates the PR.
  • Full unit suite green (2239). AOT publish of the CLI clean — no IL3050/IL2026 warnings.

No README/CLI-surface change: internal import-path latency optimization.

Closes #232
Linear: AI-1122

Follow-up to #229. ImportCommand's bulk loop calls DetectRepositoryAsync per session/cwd, which
runs the live `gh pr view` / `glab api` provider round-trip (up to ~2s best-effort each). But every
import path discards the PR fields: the scope-picker resolution and "Scanning repositories" step use
only Owner/RepoName, the per-session ingest builds its `repository` node without any pr_* fields, and
classification/import-source detection only need owner/repo. So that per-cwd round-trip is pure
wasted latency — worst on GitLab, whose `glab api` is a network call rather than gh's local resolve.

- DetectRepositoryAsync gains `detectPullRequest = true` (default preserves every live/hook/enrich
  caller) and an injectable `CommandRunner? run = null` (defaults to the real spawner) so the git and
  provider spawns are unit-testable.
- All import-path callers pass `detectPullRequest: false`: ImportCommand (scope resolution, repo
  scan, per-session ingest), TranscriptFileClassification (exclusion key), and the Copilot/Pi/Kiro/
  Cursor import-source default detectors. Base repo info (owner/repo/user/branch/host) still resolves
  from git alone.

This is a behaviour-preserving latency win — imported sessions never carried PR info, so nothing in
the imported payload changes; only the discarded network calls are removed. Adds
ImportProviderDetectionTests (import path spawns no gh/glab; live path still detects the PR).

Closes #232
Linear: AI-1122

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Skip provider PR/MR detection during bulk import

✨ Enhancement 🧪 Tests 🕐 20-40 Minutes

Grey Divider

AI Description

• Add a switch to skip PR/MR detection while still resolving base git repository metadata.
• Update all bulk-import call sites to avoid gh/glab round-trips when PR fields are unused.
• Add unit tests asserting import mode never spawns provider CLIs, while live mode still does.
Diagram

graph TD
  A["Bulk import paths"] --> B["RepositoryDetection"]
  H["Live/hook paths"] --> B
  B --> C["git CLI"]
  B -->|"detectPullRequest=false"| D["Base repo payload"]
  B -->|"detectPullRequest=true"| E["GitProviderRouter"] --> F["PR detectors"] --> G["gh/glab CLI"]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Cache PR detection per cwd during import
  • ➕ Avoids repeated provider calls when PR fields are needed later in the same run
  • ➕ Less API surface area change than adding a flag
  • ➖ Still performs wasted work here since import discards PR fields
  • ➖ Cache invalidation (branch/PR changes) can introduce correctness issues elsewhere
2. Split API: DetectBaseRepositoryAsync + DetectPullRequestAsync
  • ➕ Cleaner separation of concerns; avoids boolean-flag parameter
  • ➕ Makes call sites explicit about what they need
  • ➖ More invasive refactor across all current callers
  • ➖ Higher churn for a behavior-preserving performance change
3. Asynchronous PR detection with lazy evaluation
  • ➕ Keeps PR detection available without blocking critical path
  • ➕ Could improve perceived latency in interactive flows
  • ➖ Import still discards PR fields, so async work remains unnecessary
  • ➖ Adds concurrency complexity and potential flakiness in tests

Recommendation: Current approach is the best fit: a default-on detectPullRequest flag preserves live/hook behavior while letting import paths explicitly skip provably-unused provider round-trips. The injectable CommandRunner keeps the change testable without broad refactors or correctness risks from caching.

Files changed (8) +80 / -17

Enhancement (7) +29 / -17
CopilotImportSource.csDisable PR/MR provider detection for Copilot import repo resolution +1/-1

Disable PR/MR provider detection for Copilot import repo resolution

• Updates the default repository detector delegate to call RepositoryDetection with detectPullRequest disabled, avoiding provider CLI calls during bulk import processing.

src/Capacitor.Cli/Commands/CopilotImportSource.cs

CursorImportSource.csDisable PR/MR provider detection for Cursor import repo resolution +1/-1

Disable PR/MR provider detection for Cursor import repo resolution

• Switches the default repo detector to skip provider PR/MR detection while still resolving owner/repo via git during Cursor historical imports.

src/Capacitor.Cli/Commands/CursorImportSource.cs

ImportCommand.csSkip provider PR/MR round-trips in bulk import loops +7/-3

Skip provider PR/MR round-trips in bulk import loops

• Updates multiple bulk-import detection points (workspace scanning and per-session ingest) to call RepositoryDetection with detectPullRequest disabled, since import payloads don’t emit PR fields.

src/Capacitor.Cli/Commands/ImportCommand.cs

KiroImportSource.csDisable PR/MR provider detection for Kiro import repo resolution +1/-1

Disable PR/MR provider detection for Kiro import repo resolution

• Changes the import-source default repository detection to skip provider PR/MR lookup for performance during Kiro session import.

src/Capacitor.Cli/Commands/KiroImportSource.cs

PiImportSource.csDisable PR/MR provider detection for Pi import repo resolution +1/-1

Disable PR/MR provider detection for Pi import repo resolution

• Updates the default repo detection delegate to avoid provider CLI calls during Pi historical import, relying on git-only base repo info.

src/Capacitor.Cli/Commands/PiImportSource.cs

TranscriptFileClassification.csSkip PR/MR detection when computing excluded repo keys +2/-1

Skip PR/MR detection when computing excluded repo keys

• Changes classification-time repository detection to disable PR/MR provider detection, since exclusion keys only require owner/repo.

src/Capacitor.Cli/Commands/TranscriptFileClassification.cs

RepositoryDetection.csAdd detectPullRequest toggle and injectable command runner +16/-9

Add detectPullRequest toggle and injectable command runner

• Extends DetectRepositoryAsync with a detectPullRequest flag (default true) and an optional injectable CommandRunner, routing all git/provider invocations through the runner. Provider resolution and PR detection are now conditional on detectPullRequest, enabling import paths to avoid 'gh'/'glab' round-trips.

src/Capacitor.Cli/RepositoryDetection.cs

Tests (1) +51 / -0
ImportProviderDetectionTests.csAdd tests for import-mode provider detection skipping +51/-0

Add tests for import-mode provider detection skipping

• Introduces unit tests using a recording CommandRunner to assert that detectPullRequest=false never spawns 'gh'/'glab' while still resolving owner/repo from git, and that the default/live path still runs 'gh pr view' and populates PR fields.

test/Capacitor.Cli.Tests.Unit/ImportProviderDetectionTests.cs

@qodo-code-review

qodo-code-review Bot commented Jul 3, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used

Grey Divider


Action required

1. Import skips PR detection unmeasured ✓ Resolved 📎 Requirement gap ⚙ Maintainability
Description
The PR introduces a bulk-import optimization by disabling live PR/MR provider detection
(detectPullRequest: false) but does not include any documented latency measurement to justify the
behavior change. This violates the requirement to measure before optimizing and to avoid changes
unless latency impact is observed.
Code

src/Capacitor.Cli/Commands/ImportCommand.cs[R592-593]

+                            // Import only needs owner/repo here — skip the PR/MR provider round-trip.
+                            var repo = await RepositoryDetection.DetectRepositoryAsync(cwd, detectPullRequest: false);
Evidence
Rule 1 requires a documented latency measurement for multi-repo imports. The PR instead changes
import to call RepositoryDetection.DetectRepositoryAsync(..., detectPullRequest: false) and adds
code explicitly describing the change as a latency optimization, which triggers Rule 3’s requirement
to avoid behavior changes absent measured latency evidence.

Measure bulk import latency for multi-repo imports with per-CWD provider detection
Do not change behavior if latency is not observed
src/Capacitor.Cli/Commands/ImportCommand.cs[592-593]
src/Capacitor.Cli/RepositoryDetection.cs[116-120]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Bulk import behavior was changed to skip live PR/MR provider detection, but the required latency measurement for multi-repo imports is not documented, and the change is introduced without measured evidence.

## Issue Context
Compliance requires (1) measuring and documenting latency for per-CWD provider detection during bulk import, and (2) avoiding optimizations/behavior changes unless the measurement shows meaningful latency impact.

## Fix Focus Areas
- src/Capacitor.Cli/Commands/ImportCommand.cs[592-593]
- src/Capacitor.Cli/RepositoryDetection.cs[116-213]
- docs/import-provider-detection-latency.md[1-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Cursor import drops PR fields ✓ Resolved 🐞 Bug ≡ Correctness
Description
CursorImportSource now calls DetectRepositoryAsync with detectPullRequest:false, so its sessionStart
payload’s repository node will never include pr_number/pr_title/pr_url/pr_head_ref. Previously,
Cursor import attached RepositoryDetection.BuildRepositoryNode(repo), which emits pr_* fields when
populated, so imported Cursor payload shape changes.
Code

src/Capacitor.Cli/Commands/CursorImportSource.cs[45]

+        _repoDetector        = repoDetector ?? (cwd => RepositoryDetection.DetectRepositoryAsync(cwd, detectPullRequest: false));
Evidence
Cursor import builds and sends a repository JSON node using BuildRepositoryNode, which conditionally
serializes pr_* fields; disabling detectPullRequest prevents DetectRepositoryAsync from ever
populating PrNumber/PrTitle/PrUrl/PrHeadRef, so those fields cannot be emitted in Cursor’s imported
sessionStart payload anymore.

src/Capacitor.Cli/Commands/CursorImportSource.cs[37-46]
src/Capacitor.Cli/Commands/CursorImportSource.cs[339-362]
src/Capacitor.Cli/RepositoryDetection.cs[85-100]
src/Capacitor.Cli/RepositoryDetection.cs[121-221]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`CursorImportSource` uses `RepositoryDetection.BuildRepositoryNode(repo)` for its synthetic `session-start/cursor` payload. After this PR, the default `_repoDetector` disables PR detection (`detectPullRequest:false`), so `repo.Pr*` will always be null and `BuildRepositoryNode` will never emit `pr_*` fields for imported Cursor sessions.

If PR metadata for imported Cursor sessions is still desired, PR detection must remain enabled for this path (or be made configurable). If it is *not* desired, consider adding an explicit comment/test asserting Cursor import intentionally omits PR fields, since this is a payload-shape change.

### Issue Context
- Cursor import attaches `repository` to the sessionStart payload using `BuildRepositoryNode`, which includes PR fields when present.
- `DetectRepositoryAsync` now gates provider PR detection on `detectPullRequest`.

### Fix Focus Areas
- src/Capacitor.Cli/Commands/CursorImportSource.cs[37-46]
- src/Capacitor.Cli/Commands/CursorImportSource.cs[339-348]
- src/Capacitor.Cli/RepositoryDetection.cs[85-100]
- src/Capacitor.Cli/RepositoryDetection.cs[121-221]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/Capacitor.Cli/Commands/ImportCommand.cs
Comment thread src/Capacitor.Cli/Commands/CursorImportSource.cs Outdated
@realtonyyoung

Copy link
Copy Markdown
Contributor Author

Review findings from local diff review (origin/main...worktree-issue-232-import-provider-detection-latency):

  • [P2] Cursor import still consumes PR fields, so flipping its default detector is a behavior regression. The default detector now skips PR detection at src/Capacitor.Cli/Commands/CursorImportSource.cs:45, but the import path calls _repoDetector(workspaceFolder) and passes the full result into RepositoryDetection.BuildRepositoryNode(repo) at src/Capacitor.Cli/Commands/CursorImportSource.cs:346-347. That helper emits pr_number, pr_title, pr_url, and pr_head_ref at src/Capacitor.Cli/RepositoryDetection.cs:96-99, and Cursor import attaches that repository node to synthetic session-start/cursor at src/Capacitor.Cli/Commands/CursorImportSource.cs:537-538. Scenario: importing Cursor history from a workspace where gh pr view would resolve an active PR used to send PR metadata; this PR silently drops it.

  • [P3] ImportProviderDetectionTests don’t actually prove the import paths avoid provider CLIs. They call RepositoryDetection.DetectRepositoryAsync(... detectPullRequest: false/true) directly at test/Capacitor.Cli.Tests.Unit/ImportProviderDetectionTests.cs:29 and test/Capacitor.Cli.Tests.Unit/ImportProviderDetectionTests.cs:45, so they pass even though the Cursor import call site above still had a PR-field consumer.

Other checks from the trace: non-import callers still rely on the default detectPullRequest = true, the injectable run seam is threaded through git probes, GitProviderRouter, and both PR detectors, and no DetectRepositoryAsync path still bypasses it with direct RunCommandAsync calls except DefaultRunner.

Verification notes: git diff --check origin/main...HEAD passed. The focused TUnit command hung during build/test and had to be cancelled; dotnet test is unsupported here under .NET 10 MTP. AOT publish reached the native/trimmer phase twice but failed in the local MSBuild task host before IL warnings: MSB4216 ComputeManagedAssemblies / MSB4027, so IL3050/IL2026 verification is inconclusive in this environment.

realtonyyoung and others added 3 commits July 3, 2026 13:57
…s session-start)

Review follow-up: CursorImportSource is the one import source whose synthetic session-start payload
carries a `repository` node (AI-1152) built via RepositoryDetection.BuildRepositoryNode, which emits
pr_number/pr_title/pr_url/pr_head_ref when a PR is detected. The bulk-import latency change flipped
its default repo detector to detectPullRequest:false, which silently stripped those PR fields from
imported Cursor sessions — a payload-shape regression.

Revert only Cursor's default detector to PR-detecting. The other flipped paths (ImportCommand x3,
TranscriptFileClassification, and the Pi/Kiro/Copilot sources) use their detector result only for the
owner/repo exclusion key and never via BuildRepositoryNode, so they keep the latency win. Adds a test
asserting the Cursor session-start payload carries pr_* when the repo has a PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…port-provider-detection-latency

# Conflicts:
#	src/Capacitor.Cli/RepositoryDetection.cs
Records the analysis behind skipping live PR/MR detection during bulk import (results are
discarded on those paths — dead work, not a speculative latency optimization), why no live
large-import benchmark was run, and the Cursor exception that keeps PR detection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alexeyzimarev alexeyzimarev merged commit 731b1a1 into main Jul 5, 2026
5 checks passed
@alexeyzimarev alexeyzimarev deleted the worktree-issue-232-import-provider-detection-latency branch July 5, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Watch/optimize import latency from per-cwd provider detection

2 participants