ScribeAir
Offline voice input for Windows — push-to-talk and voice activation
Speak — text appears at cursor. Hold a hotkey or say “record” to start. Works in any application.
Download | Features | Benchmarks | Installation | Русский
GigaAM achieves 3.3% WER on Russian (CPU) — outperforms every Whisper model, even on an RTX 4090 GPU. Fully offline after initial model download. Free and open source.
Why ScribeAir?
- Offline — works without internet (Windows Voice Typing, Google, Dragon require cloud)
- Best Russian quality — 3.3% WER vs ~25% Windows, ~10% Google, ~8% Dragon
- Free & open source — MIT license (Dragon costs $300+, Google charges per request)
- Configurable — 3 ASR backends to choose from (GigaAM, Whisper, Vosk)
- Private — 100% local, your speech never leaves your computer
Download
Pre-built Windows executables are available in Releases — no Python required.
Extract the ZIP and run ScribeAir.exe. Models download automatically on first launch.
Features
- Push-to-talk with configurable hotkeys (LShift+RShift, Win+Shift, etc.)
- Wake word activation — say “запись” / “record” to start, “стоп” / “stop” to finish (no hotkey needed)
- 3 ASR backends: Whisper (GPU/CPU), GigaAM (ONNX, Russian-optimized), Vosk (lightweight)
- Progressive transcription — see intermediate results in real-time during recording
- IT term replacement — automatic питон→Python, гугл→Google (81 terms with morphology)
- Real-time overlay showing recording/transcription progress
- Streaming pipeline with Silero VAD for instant voice detection
- Fully offline after initial model download
- Multi-language: Russian, English, auto-detect, mixed RU+EN, RU→EN translation
- T5 text correction for fixing ASR errors in Russian
- System tray UI with full settings menu
- GPU acceleration via NVIDIA CUDA (auto CPU fallback)
- Dual builds — CUDA (~4.9 GB) and CPU-only (~800 MB)
- Custom vocabulary for domain-specific terms
- Windows autostart support
Architecture
Wake Word (OWW) ──→ Recording ──→ Progressive GigaAM (live preview)
or Hotkey │ │
▼ ▼ "стоп/stop"
Silero VAD → ASR Engine → Term Replacement → Clipboard
├── GigaAM (Russian, CPU)
├── Whisper (multilingual, GPU/CPU)
└── Vosk (lightweight, CPU)
Recognition Quality
Benchmarked on Russian audiobook data (detailed benchmarks):
Backend Model WER Reduction Latency
───────── ───────────────── ────── ────────── ──────────
GigaAM v3-e2e-rnnt 3.3% 90.0% 0.66s (CPU)
GigaAM v3-rnnt 3.3% 90.0% 0.82s (CPU)
GigaAM v3-e2e-ctc 4.2% 87.2% 1.08s (CPU)
Whisper large-v3-turbo 7.9% 75.7% 0.44s (GPU)
Whisper large-v3 8.8% 72.9% 2.30s (GPU)
Whisper medium 10.7% 67.2% 1.75s (GPU)
Vosk small-ru 13.0% 60.0% 0.75s (CPU)
Whisper base (baseline) 32.6% — 0.42s (CPU)
Key finding: GigaAM on CPU (3.3% WER) outperforms Whisper large-v3-turbo on RTX 4090 GPU (7.9% WER) for Russian.
Installation
From source
Requirements: Windows 10/11, Python 3.10+, 8 GB RAM, microphone.
git clone https://github.com/borisovai/scribe-air.git
cd scribe-air
python -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
venv\Scripts\python.exe src\main.py
Models download automatically on first run (~240 MB for GigaAM, ~1.6 GB for Whisper).
Usage
- Launch the application — a microphone icon appears in the system tray
- Wait for model loading (icon turns from blue to gray)
- Start recording using either method:
- Hotkey — hold LShift + RShift (configurable), release to finish
- Voice — say “запись” / “record” to start, “стоп” / “stop” to finish
- Speak into your microphone
- Text is inserted at the cursor position
Voice Activation (Wake Word)
Enable “Wake Word” in the tray menu for hands-free operation — no hotkey needed, fully voice-controlled.
Tray Menu (right-click)
- Language — Auto-detect, Russian, English, RU+EN Mixed, RU→EN Translate
- Model — Tiny, Small, Medium, Large v3, Large v3 Turbo
- ASR Backend — Auto, Whisper, GigaAM, Vosk
- GigaAM Model — v2-ctc, v3-rnnt, v3-e2e-rnnt, etc.
- Hotkey — Key combination for push-to-talk
- Audio Device — Select input microphone
- Wake Word — Toggle voice activation (say “запись”/“record” to start)
- IT Term Replacement — Toggle automatic Cyrillic → Latin term conversion
- Text Correction (T5) — Toggle T5 spell correction
- Start with Windows — Toggle autostart
- Show/Hide Console — Open a debug console with live logs
Configuration
Settings are stored in %APPDATA%\ScribeAir\config.json:
{
"language_mode": "auto",
"model": "large-v3-turbo",
"device": "cuda",
"hotkey": "shift+shift",
"asr_backend": "auto",
"gigaam_model": "v3-e2e-rnnt",
"llm_correction_enabled": true,
"custom_vocabulary": ["Kubernetes", "PostgreSQL"]
}
ASR Backend Modes
auto— Default: GigaAM for Russian on CPU, Whisper on GPU (3.3–7.9% WER, 0.4–0.8s)whisper— English or GPU-accelerated transcription (7.9% WER, 0.44s on GPU)gigaam— Russian on CPU, best quality (3.3% WER, 0.66s)vosk— Ultra-low latency, short phrases (13% WER, 0.7s)
Building EXE
# Build both variants
venv\Scripts\python.exe build.py
# Build only CPU or CUDA
venv\Scripts\python.exe build.py cpu
venv\Scripts\python.exe build.py cuda
- CUDA (~4.9 GB) — Full GPU support, includes NVIDIA DLLs
- CPU (~800 MB) — CPU-only, no CUDA dependencies
AI models are not bundled — they download automatically on first launch and cache in %APPDATA%\ScribeAir\models\.
Project Structure
scribe-air/
├── src/
│ ├── main.py # Application entry point
│ ├── config.py # Configuration management
│ ├── hotkey.py # Global hotkey detection
│ ├── recorder.py # Audio recording (sounddevice)
│ ├── transcriber.py # Whisper transcription
│ ├── gigaam_transcriber.py # GigaAM ONNX transcription
│ ├── vosk_transcriber.py # Vosk transcription
│ ├── streaming_pipeline.py # Streaming VAD + transcription
│ ├── wakeword_listener.py # Wake word detection (openWakeWord)
│ ├── term_replacer.py # IT term replacement (Cyrillic → Latin)
│ ├── audio_processor.py # Audio preprocessing
│ ├── text_corrector_t5.py # T5 text correction
│ ├── model_downloader.py # Model download with mirror fallback
│ ├── inserter.py # Text insertion via clipboard
│ ├── overlay.py # Floating transcription window
│ ├── tray.py # System tray UI
│ └── autostart.py # Windows autostart
├── wakeword_data/models/ # Wake word ONNX models (zapis, stop)
├── tests/ # Pytest test suite
├── docs/guides/ # User and developer guides
├── assets/icon.ico # Application icon
├── requirements.txt # Dependencies
├── voice_app.spec # PyInstaller config
└── build.py # Build script
Troubleshooting
- Model not loading — Check internet connection. First download is ~240 MB–1.6 GB from HuggingFace. Models cache in
%APPDATA%\ScribeAir\models\ - No audio — Verify microphone in Windows Sound Settings. Select device via Audio Device in tray menu
- GPU not used — Install CUDA 12.x, update NVIDIA drivers. Set
"device": "cuda"in config. Use CUDA build - Text not inserted — Ensure cursor is in a text field. Test with Notepad first
- Debug logs — Right-click tray icon → “Show Console” for live logs. Log file:
%APPDATA%\ScribeAir\scribe_air.log
Testing
pip install -r requirements-dev.txt
pytest tests/ -v
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
License
MIT License. See LICENSE.
Acknowledgements
- GigaAM — Russian ASR model (Sberbank, trained on 700K hours)
- faster-whisper — Optimized Whisper implementation
- onnx-asr — ONNX ASR inference
- Vosk — Lightweight offline speech recognition
- Silero VAD — Neural voice activity detection
- bond005/ruT5-ASR-large — T5 Russian ASR correction