Summary
When using push-to-talk (PTT) voice dictation (hold the trigger key, speak, release), if I start typing before the final transcript lands, my entire dictation is lost. The grey "preview" text disappears and is replaced by whatever I type.
Steps to reproduce
- Place the cursor in the prompt input.
- Hold the voice push-to-talk trigger, speak a sentence (the live transcription appears as grey/dim preview text), then release the key.
- Immediately — before the text turns solid/committed (white) — type any character.
Expected
The pending dictation should be committed first (the grey preview becomes real text), and my typed character should be appended after it. I should not lose what I dictated.
Actual
The grey preview text is cleared and the finalized transcript is dropped entirely. Only the character I typed remains. The whole dictation is lost.
Root cause (from inspecting the bundled app.js, v1.0.63)
The voice reactor hook (ycr) drives a small state machine: idle → recording → finalizing → idle.
-
During recording, streaming partial transcripts are shown via the input controller's setPreview(...) — this is the grey/dim preview text.
-
Releasing the PTT key calls finishSession({ commit: true }) and moves to the finalizing state, while the speech engine asynchronously drains audio and returns the final transcript.
-
The keyboard handler only intercepts/commits a keystroke when mode === "dictation" && state === "recording":
if (Fe && Fe.mode === "dictation" && x.current === "recording" && (he.length > 0 || ...))
return fe(), true; // commit + swallow the key
In PTT mode, and during the finalizing window, this guard is false, so the keystroke is not intercepted — it falls through into the input box and mutates the buffer.
-
When finishSession resolves, the result handler (ae) clears the preview and, for PTT specifically, performs a strict snapshot check against the recording anchor:
if (he.mode === "ptt") {
if (Ge.text !== he.anchor.before + he.anchor.after || Ge.cursorPosition !== he.anchor.pos) {
// logs: "ptt: snapshot diverged after release; dropped transcript"
return; // <-- DROPS the transcript
}
Ge.insertInput(...);
} else {
// dictation (toggle) mode: on divergence it still inserts at the current cursor (does NOT drop)
}
Because the user typed during the finalizing window, the input no longer matches the anchor, so the PTT branch hits the divergence guard and drops the transcript. Dictation (toggle) mode does not drop on divergence — it recovers by inserting at the current cursor. This asymmetry is the bug: PTT loses the dictation, toggle mode does not.
Suggested fixes (either would resolve it)
- Queue keystrokes during
finalizing: while a session is finalizing, hold incoming keystrokes until the transcript is inserted (mirrors how dictation-while-recording already swallows the first keystroke to commit), then apply them. This commits the grey preview before the typed text.
- Make PTT recover like dictation: on snapshot divergence, insert the committed transcript at the current cursor position instead of dropping it. This is the smaller change and preserves the dictation in every case.
Environment
- GitHub Copilot CLI
v1.0.63
- Platform: Linux
- Voice mode: push-to-talk (hold-to-talk)
Summary
When using push-to-talk (PTT) voice dictation (hold the trigger key, speak, release), if I start typing before the final transcript lands, my entire dictation is lost. The grey "preview" text disappears and is replaced by whatever I type.
Steps to reproduce
Expected
The pending dictation should be committed first (the grey preview becomes real text), and my typed character should be appended after it. I should not lose what I dictated.
Actual
The grey preview text is cleared and the finalized transcript is dropped entirely. Only the character I typed remains. The whole dictation is lost.
Root cause (from inspecting the bundled
app.js, v1.0.63)The voice reactor hook (
ycr) drives a small state machine:idle → recording → finalizing → idle.During recording, streaming partial transcripts are shown via the input controller's
setPreview(...)— this is the grey/dim preview text.Releasing the PTT key calls
finishSession({ commit: true })and moves to thefinalizingstate, while the speech engine asynchronously drains audio and returns the final transcript.The keyboard handler only intercepts/commits a keystroke when
mode === "dictation" && state === "recording":In PTT mode, and during the
finalizingwindow, this guard is false, so the keystroke is not intercepted — it falls through into the input box and mutates the buffer.When
finishSessionresolves, the result handler (ae) clears the preview and, for PTT specifically, performs a strict snapshot check against the recording anchor:Because the user typed during the
finalizingwindow, the input no longer matches the anchor, so the PTT branch hits the divergence guard and drops the transcript. Dictation (toggle) mode does not drop on divergence — it recovers by inserting at the current cursor. This asymmetry is the bug: PTT loses the dictation, toggle mode does not.Suggested fixes (either would resolve it)
finalizing: while a session is finalizing, hold incoming keystrokes until the transcript is inserted (mirrors how dictation-while-recording already swallows the first keystroke to commit), then apply them. This commits the grey preview before the typed text.Environment
v1.0.63