Skip live PR/MR provider detection during bulk import#263
Conversation
Follow-up to #229. ImportCommand's bulk loop calls DetectRepositoryAsync per session/cwd, which runs the live `gh pr view` / `glab api` provider round-trip (up to ~2s best-effort each). But every import path discards the PR fields: the scope-picker resolution and "Scanning repositories" step use only Owner/RepoName, the per-session ingest builds its `repository` node without any pr_* fields, and classification/import-source detection only need owner/repo. So that per-cwd round-trip is pure wasted latency — worst on GitLab, whose `glab api` is a network call rather than gh's local resolve. - DetectRepositoryAsync gains `detectPullRequest = true` (default preserves every live/hook/enrich caller) and an injectable `CommandRunner? run = null` (defaults to the real spawner) so the git and provider spawns are unit-testable. - All import-path callers pass `detectPullRequest: false`: ImportCommand (scope resolution, repo scan, per-session ingest), TranscriptFileClassification (exclusion key), and the Copilot/Pi/Kiro/ Cursor import-source default detectors. Base repo info (owner/repo/user/branch/host) still resolves from git alone. This is a behaviour-preserving latency win — imported sessions never carried PR info, so nothing in the imported payload changes; only the discarded network calls are removed. Adds ImportProviderDetectionTests (import path spawns no gh/glab; live path still detects the PR). Closes #232 Linear: AI-1122 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR Summary by QodoSkip provider PR/MR detection during bulk import
AI Description
Diagram
High-Level Assessment
Files changed (8)
|
Code Review by Qodo
Context used 1.
|
|
Review findings from local diff review (
Other checks from the trace: non-import callers still rely on the default Verification notes: |
…s session-start) Review follow-up: CursorImportSource is the one import source whose synthetic session-start payload carries a `repository` node (AI-1152) built via RepositoryDetection.BuildRepositoryNode, which emits pr_number/pr_title/pr_url/pr_head_ref when a PR is detected. The bulk-import latency change flipped its default repo detector to detectPullRequest:false, which silently stripped those PR fields from imported Cursor sessions — a payload-shape regression. Revert only Cursor's default detector to PR-detecting. The other flipped paths (ImportCommand x3, TranscriptFileClassification, and the Pi/Kiro/Copilot sources) use their detector result only for the owner/repo exclusion key and never via BuildRepositoryNode, so they keep the latency win. Adds a test asserting the Cursor session-start payload carries pr_* when the repo has a PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…port-provider-detection-latency # Conflicts: # src/Capacitor.Cli/RepositoryDetection.cs
Records the analysis behind skipping live PR/MR detection during bulk import (results are discarded on those paths — dead work, not a speculative latency optimization), why no live large-import benchmark was run, and the Cursor exception that keeps PR detection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What & why
Follow-up to #229.
ImportCommand's bulk loop callsRepositoryDetection.DetectRepositoryAsyncper session/cwd, which runs the livegh pr view/glab apiprovider round-trip (up to ~2s best-effort each). The per-host memo inGitProviderRouterstops thegh auth statusprobe from multiplying, but the detection call still runs per cwd.Finding (stronger than the issue's "measure first"): every import path discards the PR fields, so that round-trip is provably dead work, not just possibly-slow:
Owner/RepoName.repositorynode manually withuser_name/owner/repo_name/branch/remote_url— nopr_*fields.owner/repo.The live/hook path (
BuildRepositoryNodeviaEnrichWithRepositoryInfo) does emitpr_number/pr_title/… — so PR detection matters there, just not in import.How
DetectRepositoryAsyncgainsdetectPullRequest = true(default keeps every live/hook/enrich caller unchanged) and an injectableCommandRunner? run = null(defaults to the real process spawner) so the git + provider spawns are unit-testable.detectPullRequest: false:ImportCommand(scope resolution, repo scan, per-session ingest),TranscriptFileClassification, and the Copilot/Pi/Kiro/Cursor import-source default detectors. Base repo info still resolves from git alone.This is behaviour-preserving — imported sessions never carried PR info, so nothing in the imported payload changes; only the discarded network round-trips are removed. It fully realizes the issue's second remedy ("skip live provider detection during bulk import"), which is cleaner than caching since the result was unused.
Testing
ImportProviderDetectionTests(injected recording runner): the import path (detectPullRequest: false) resolves owner/repo from git and spawns nogh/glab; the default (live) path still runsgh pr viewand populates the PR.No README/CLI-surface change: internal import-path latency optimization.
Closes #232
Linear: AI-1122