Voice-to-text dictation overlay. Hold a hotkey, talk, release — your cleaned-up words appear in whatever app has focus. Powered by Aavaaz for streaming transcription and an LLM polish pass for filler removal, punctuation, and per-app tone.
qol (қол) means "voice" in Kazakh — a sibling name to Aavaaz ("voice" in Hindi).
┌──────────────────────────────────────────────────┐
│ qol (Tauri desktop app) │
│ │
│ global-shortcut ──► push-to-talk │
│ cpal ──► 16 kHz mono PCM │
│ tokio-tungstenite──► ws://localhost:9090 │ ── Aavaaz/WhisperLive ──► transcript
│ active-win ──► focused app context │
│ Claude API ──► polish (tone, punctuation) │
│ enigo ──► inject text into focused app│
│ │
│ webview ──► settings UI (Vite + TS) │
└──────────────────────────────────────────────────┘
| Path | Purpose |
|---|---|
src-tauri/Cargo.toml |
Rust deps (tauri, cpal, tokio-tungstenite, enigo, reqwest) |
src-tauri/src/main.rs |
App entry, hotkey wiring, Tauri commands |
src-tauri/src/audio.rs |
Mic capture → 16 kHz mono f32 frames |
src-tauri/src/transport.rs |
WebSocket session to Aavaaz/WhisperLive |
src-tauri/src/session.rs |
Lifecycle: audio → transport → polish → inject |
src-tauri/src/inject.rs |
Keystroke injection (enigo) + active-window probe |
src-tauri/src/polish.rs |
Claude API call for transcript cleanup |
src-tauri/src/config.rs |
JSON config in ~/.config/qol/config.json |
index.html + src/ |
Vite settings UI |
- Rust 1.77+ (
rustup toolchain install stable) - Node 20+ (
pnpmornpm) - A running Aavaaz instance at
ws://localhost:9090 - Optional LLM polish: any OpenAI-compatible endpoint —
OPENAI_API_KEYenv var for OpenAI (default)- Or point
base_urlat Groq, OpenRouter, Together, Cerebras, Mistral, ... - Or fully local: Ollama (
http://localhost:11434/v1, modelqwen2.5:7b-instruct) or llama.cpp's--server(http://localhost:8080/v1) — leave the API key env var empty - Or skip entirely: with polish disabled, raw transcripts inject fine
sudo dnf install -y \
webkit2gtk4.1-devel \
openssl-devel \
curl wget file \
libappindicator-gtk3-devel \
librsvg2-devel \
gtk3-devel \
alsa-lib-devel \
libxdo-devellibxdo-devel is needed by enigo on X11.
GNOME Wayland blocks synthetic X11 input, so we automatically detect Wayland
and route injection through ydotool.
sudo dnf install ydotool # or: apt install ydotoolHow you set this up depends on whether ydotoold runs as root (a system
service — Fedora/Debian/Ubuntu) or as your user (an Arch user unit).
This is the common case. ydotoold runs as root, so it already has /dev/uinput
access — you do not need a udev rule or the input group. Those only matter
when the daemon runs as your user (next section).
The real problem is the socket. Run as root, ydotoold creates
/tmp/.ydotool_socket owned root:root 0600, which (a) your unprivileged
clients can't open, and (b) isn't where the client looks by default
($XDG_RUNTIME_DIR/.ydotool_socket, i.e. /run/user/<uid>/.ydotool_socket).
Two mismatches, both silent.
Fix both with a drop-in that pins a known path, hands ownership to your user,
and makes it group-readable (replace 1000:1000 with your id -u:id -g):
sudo systemctl enable ydotool # Fedora unit name; Debian: ydotoold
sudo mkdir -p /etc/systemd/system/ydotool.service.d
sudo tee /etc/systemd/system/ydotool.service.d/socket.conf >/dev/null <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-path=/tmp/.ydotool_socket --socket-perm=0660 --socket-own=1000:1000
EOF
sudo systemctl daemon-reload
sudo systemctl restart ydotoolThen tell clients where the socket is. qol already does this internally
(inject.rs sets YDOTOOL_SOCKET=/tmp/.ydotool_socket unless you override it),
so qol works with no further config. For your own shell, make it permanent:
echo 'YDOTOOL_SOCKET=/tmp/.ydotool_socket' | sudo tee -a /etc/environmentBecause the socket is owned by your user (not root:input), this works
without the input group and without a logout — group membership added
by usermod -aG doesn't reach an already-running GNOME session anyway, which is
the usual reason "it worked after I logged out" turns out false.
Here ydotoold runs as you, so it needs /dev/uinput access via a udev rule and
the input group, and the socket lands in $XDG_RUNTIME_DIR where the client
already looks — no YDOTOOL_SOCKET needed:
echo 'KERNEL=="uinput", MODE="0660", GROUP="input"' | \
sudo tee /etc/udev/rules.d/80-uinput.rules
sudo udevadm control --reload && sudo udevadm trigger
sudo usermod -aG input "$USER" # then fully log out + back in (or reboot)
systemctl --user enable --now ydotoolsystemctl status ydotool --no-pager # active (running)
ls -l "${YDOTOOL_SOCKET:-/tmp/.ydotool_socket}" # socket exists, owned by you
YDOTOOL_SOCKET=/tmp/.ydotool_socket ydotool type "hello" # types into focused windowfailed to connect socket … No such file or directory→ daemon isn't running, or the client is looking at the wrong path (setYDOTOOL_SOCKET).failed to connect socket … Permission denied→ the socket is ownedroot:inputand your session isn't ininput; use the--socket-owndrop-in above instead of relying on the group.failed to open uinput device→ only with a user-run daemon: the udev rule orinputgroup hasn't taken effect (reboot to be sure).
qol picks the backend automatically at startup. Look for
selected injection backend = Ydotool in the logs to confirm.
tauri-plugin-global-shortcut can't grab keys under GNOME Wayland (Mutter
refuses the X11-style key grab), and the modern xdg-desktop-portal
GlobalShortcuts interface rejects non-sandboxed apps because the portal
sends an empty app_id and gnome-control-center discards the request:
gnome-control-center-global-shortcuts-provider:
Discarded shortcut bind request from application with an invalid app_id ><.
Workaround: qol always opens a Unix socket at $XDG_RUNTIME_DIR/qol.sock,
and ships a tiny qol-trigger CLI that pokes it. Bind a GNOME Custom
Shortcut to qol-trigger toggle and you get a working hotkey on Wayland:
- Install the CLI on your PATH:
sudo install -m 755 src-tauri/target/debug/qol-trigger /usr/local/bin/qol-trigger
- Settings → Keyboard → View and Customize Shortcuts → Custom Shortcuts → +
- Name:
qol toggle dictation - Command:
/usr/local/bin/qol-trigger toggle - Shortcut: pick your combo (e.g.
Ctrl+Alt+Space— make sure nothing else has it;Super+Spaceis grabbed by GNOME's input-source switcher and won't reach the command)
- Name:
- Start
qolonce so the socket exists, then press your combo. First press starts dictation, second press stops it.
Since GNOME custom keybindings only fire on press (no release event), the hotkey is toggle, not push-to-talk. Aavaaz's VAD finalizes segments naturally during dictation; toggling again ends the session.
Other commands the CLI supports:
qol-trigger status # prints "idle" or "recording"
qol-trigger start # idempotent
qol-trigger stop # idempotent
qol-trigger toggle # defaultThe trigger socket is enabled on every OS, not just Linux, so the same CLI works from scripts on macOS and X11 too. On X11/macOS/Windows you also still have real push-to-talk through the in-process global-shortcut plugin — pick whichever feels better.
# in one terminal — start Aavaaz with a model that fits your GPU
cd ../Aavaaz/aavaaz
source .venv/bin/activate
aavaaz serve --model distil-large-v3
# in another — build and run qol
cd ../../qol
npm install
npm run tauri devThen press your hotkey (default Super+Space), speak, and release.
This walks through the end-to-end smoke test from a cold start. Aimed at a
single workstation: Aavaaz running on localhost, qol injecting into the
focused text field.
cd ~/src/qol
npm install
( cd src-tauri && cargo build )The first build pulls a lot of dependencies (~5 min). Subsequent builds are seconds.
Pick a model that fits your GPU's VRAM. For a 6 GB card (e.g. RTX 3060):
cd ~/src/Aavaaz/aavaaz
source .venv/bin/activate
aavaaz serve --model distil-large-v3You should see something like:
INFO whisper_live - WebSocket server listening on 0.0.0.0:9090
INFO whisper_live - Loaded distil-large-v3 on cuda:0
Sanity-check from another terminal:
ss -tln | grep 9090 # port is listeningWe want to isolate STT before mixing in an LLM. Either toggle it off in the settings window after first launch, or pre-seed the config:
mkdir -p ~/.config/qol
cat > ~/.config/qol/config.json <<'EOF'
{
"aavaaz_url": "ws://localhost:9090",
"model": "distil-large-v3",
"language": "en",
"hotkey": "Super+Space",
"polish": {
"enabled": false,
"base_url": "https://api.openai.com/v1",
"model": "gpt-4o-mini",
"api_key_env": "OPENAI_API_KEY",
"per_app_tone": true
},
"hotwords": [],
"inject_method": "type"
}
EOFRUST_LOG=qol=debug,warn ~/src/qol/src-tauri/target/debug/qolYou should see roughly:
INFO qol::inject: selected injection backend backend=Enigo
(backend=Ydotool if you're on Wayland with ydotool installed.)
The window stays hidden. Look for the tray icon — see Troubleshooting if it's missing on GNOME.
- Focus any text field (a terminal, gedit, your browser address bar).
- Hold
Super+Space, say a sentence, release. - Watch the qol logs — you should see:
INFO qol: session started DEBUG qol::session: session started app=Some("...") INFO qol: session stopped - The transcript should land in the focused field within ~1 second of release.
Open the settings window from the tray, check Clean up transcripts, and configure:
- OpenAI:
https://api.openai.com/v1, modelgpt-4o-mini, env varOPENAI_API_KEY - Groq (very fast):
https://api.groq.com/openai/v1, modelllama-3.1-8b-instant, env varGROQ_API_KEY - Local Ollama:
http://localhost:11434/v1, modelqwen2.5:7b-instruct, env var blank
export the key in the same shell you launch qol from, then restart qol.
| Symptom | Likely cause | Fix |
|---|---|---|
Aavaaz errors libcudnn_ops_infer.so: cannot open |
cuDNN not on path | pip install nvidia-cudnn-cu12 inside the Aavaaz venv |
Aavaaz CUDA out of memory |
Model too big for your VRAM | Use distil-large-v3 or medium instead of large-v3 |
| qol logs "no default input device" | Mic not picked by PulseAudio/Pipewire | pactl list sources short; set a default with pactl set-default-source <name> |
| No tray icon on GNOME | GNOME hides AppIndicators by default | sudo dnf install gnome-shell-extension-appindicator, then enable "AppIndicator and KStatusNotifierItem Support" |
| Hotkey does nothing | Already grabbed by another app | Pick a different combo in settings (e.g. Ctrl+Alt+Space) |
| Text doesn't appear in focused app (GNOME Wayland) | Wayland blocks synthetic input | Install ydotool + ydotoold (see Wayland section above); restart qol; verify backend=Ydotool in logs |
| Polish silently produces no text | API key env var unset or wrong name | echo $OPENAI_API_KEY; restart qol after export-ing |
connect failed: ConnectionRefused |
Aavaaz not running | Start it on :9090 first |
End-to-end, on a 6 GB GPU with polish disabled, expect roughly:
- Hotkey press → first PCM frame to Aavaaz: <50 ms
- End of speech → first completed segment from Aavaaz: 300–800 ms (depends on VAD pause threshold)
- First completed segment → text in focused app: <50 ms
- With polish enabled (OpenAI
gpt-4o-mini): add ~300–600 ms per segment
If you're seeing multi-second lag, that's almost always Aavaaz model load
or CPU fallback (check nvidia-smi while dictating — qol should drive the
GPU to ~30% utilization momentarily).
Edit via the settings window (open from system tray), or directly at
~/.config/qol/config.json:
{
"aavaaz_url": "ws://localhost:9090",
"model": "distil-large-v3",
"language": "en",
"hotkey": "Super+Space",
"polish": {
"enabled": true,
"base_url": "https://api.openai.com/v1",
"model": "gpt-4o-mini",
"api_key_env": "OPENAI_API_KEY",
"per_app_tone": true
},
"hotwords": ["Aavaaz", "qol", "WhisperLive"],
"inject_method": "type"
}cd src-tauri
cargo test # unit tests (config round-trip, hotkey parser)
cargo clippy --all-targets -- -D warnings
cargo fmt -- --checkCI runs the above on Ubuntu, macOS, and Windows for every push and PR — see .github/workflows/ci.yml.
Hardware-bound paths (audio capture, keystroke injection) and network-bound paths (WebSocket session, Claude polish) aren't unit-tested yet; integration tests with a fake Aavaaz endpoint and a virtual audio device are a TODO.
This is a scaffold. Working / stubbed:
- Mic capture with
rubatopolyphase resampling to 16 kHz - WebSocket session with Aavaaz/WhisperLive handshake
- LLM polish pass with per-app tone hint
- Streaming injection (each completed segment types as it arrives)
- Voice commands:
scratch that,new line,new paragraph,select all - Linux injection backend selector:
enigoon X11,ydotoolon Wayland - Global hotkey via
tauri-plugin-global-shortcut - Tray menu (open settings, pause/resume, quit)
- Settings UI
- Per-app tone profiles configurable in UI
- Tone-rolling-context across segments (consistency in long dictation)
- Local-only polish via llama.cpp
- Integration tests with a fake Aavaaz WS server
MPL-2.0