ScribeAir

Offline voice input for Windows — push-to-talk and voice activation

Speak — text appears at cursor. Hold a hotkey or say “record” to start. Works in any application.

Download | Features | Benchmarks | Installation | Русский

GigaAM achieves 3.3% WER on Russian (CPU) — outperforms every Whisper model, even on an RTX 4090 GPU. Fully offline after initial model download. Free and open source.

Why ScribeAir?

Offline — works without internet (Windows Voice Typing, Google, Dragon require cloud)
Best Russian quality — 3.3% WER vs ~25% Windows, ~10% Google, ~8% Dragon
Free & open source — MIT license (Dragon costs $300+, Google charges per request)
Configurable — 3 ASR backends to choose from (GigaAM, Whisper, Vosk)
Private — 100% local, your speech never leaves your computer

Download

Pre-built Windows executables are available in Releases — no Python required.

Extract the ZIP and run ScribeAir.exe. Models download automatically on first launch.

Features

Push-to-talk with configurable hotkeys (LShift+RShift, Win+Shift, etc.)
Wake word activation — say “запись” / “record” to start, “стоп” / “stop” to finish (no hotkey needed)
3 ASR backends: Whisper (GPU/CPU), GigaAM (ONNX, Russian-optimized), Vosk (lightweight)
Progressive transcription — see intermediate results in real-time during recording
IT term replacement — automatic питон→Python, гугл→Google (81 terms with morphology)
Real-time overlay showing recording/transcription progress
Streaming pipeline with Silero VAD for instant voice detection
Fully offline after initial model download
Multi-language: Russian, English, auto-detect, mixed RU+EN, RU→EN translation
T5 text correction for fixing ASR errors in Russian
System tray UI with full settings menu
GPU acceleration via NVIDIA CUDA (auto CPU fallback)
Dual builds — CUDA (~4.9 GB) and CPU-only (~800 MB)
Custom vocabulary for domain-specific terms
Windows autostart support

Architecture

Wake Word (OWW)  ──→  Recording  ──→  Progressive GigaAM (live preview)
    or Hotkey              │                    │
                           ▼                    ▼ "стоп/stop"
                     Silero VAD  →  ASR Engine  →  Term Replacement  →  Clipboard
                                    ├── GigaAM  (Russian, CPU)
                                    ├── Whisper  (multilingual, GPU/CPU)
                                    └── Vosk     (lightweight, CPU)

Recognition Quality

Benchmarked on Russian audiobook data (detailed benchmarks):

Backend    Model              WER     Reduction   Latency
─────────  ─────────────────  ──────  ──────────  ──────────
GigaAM     v3-e2e-rnnt        3.3%   90.0%       0.66s (CPU)
GigaAM     v3-rnnt            3.3%   90.0%       0.82s (CPU)
GigaAM     v3-e2e-ctc         4.2%   87.2%       1.08s (CPU)
Whisper    large-v3-turbo     7.9%   75.7%       0.44s (GPU)
Whisper    large-v3           8.8%   72.9%       2.30s (GPU)
Whisper    medium            10.7%   67.2%       1.75s (GPU)
Vosk       small-ru          13.0%   60.0%       0.75s (CPU)
Whisper    base (baseline)   32.6%   —           0.42s (CPU)

Key finding: GigaAM on CPU (3.3% WER) outperforms Whisper large-v3-turbo on RTX 4090 GPU (7.9% WER) for Russian.

Installation

From source

Requirements: Windows 10/11, Python 3.10+, 8 GB RAM, microphone.

git clone https://github.com/borisovai/scribe-air.git
cd scribe-air

python -m venv venv
venv\Scripts\Activate.ps1

pip install -r requirements.txt
venv\Scripts\python.exe src\main.py

Models download automatically on first run (~240 MB for GigaAM, ~1.6 GB for Whisper).

Usage

Launch the application — a microphone icon appears in the system tray
Wait for model loading (icon turns from blue to gray)
Start recording using either method:
- Hotkey — hold LShift + RShift (configurable), release to finish
- Voice — say “запись” / “record” to start, “стоп” / “stop” to finish
Speak into your microphone
Text is inserted at the cursor position

Voice Activation (Wake Word)

Enable “Wake Word” in the tray menu for hands-free operation — no hotkey needed, fully voice-controlled.

Tray Menu (right-click)

Language — Auto-detect, Russian, English, RU+EN Mixed, RU→EN Translate
Model — Tiny, Small, Medium, Large v3, Large v3 Turbo
ASR Backend — Auto, Whisper, GigaAM, Vosk
GigaAM Model — v2-ctc, v3-rnnt, v3-e2e-rnnt, etc.
Hotkey — Key combination for push-to-talk
Audio Device — Select input microphone
Wake Word — Toggle voice activation (say “запись”/“record” to start)
IT Term Replacement — Toggle automatic Cyrillic → Latin term conversion
Text Correction (T5) — Toggle T5 spell correction
Start with Windows — Toggle autostart
Show/Hide Console — Open a debug console with live logs

Configuration

Settings are stored in %APPDATA%\ScribeAir\config.json:

{
  "language_mode": "auto",
  "model": "large-v3-turbo",
  "device": "cuda",
  "hotkey": "shift+shift",
  "asr_backend": "auto",
  "gigaam_model": "v3-e2e-rnnt",
  "llm_correction_enabled": true,
  "custom_vocabulary": ["Kubernetes", "PostgreSQL"]
}

ASR Backend Modes

auto — Default: GigaAM for Russian on CPU, Whisper on GPU (3.3–7.9% WER, 0.4–0.8s)
whisper — English or GPU-accelerated transcription (7.9% WER, 0.44s on GPU)
gigaam — Russian on CPU, best quality (3.3% WER, 0.66s)
vosk — Ultra-low latency, short phrases (13% WER, 0.7s)

Building EXE

# Build both variants
venv\Scripts\python.exe build.py

# Build only CPU or CUDA
venv\Scripts\python.exe build.py cpu
venv\Scripts\python.exe build.py cuda

CUDA (~4.9 GB) — Full GPU support, includes NVIDIA DLLs
CPU (~800 MB) — CPU-only, no CUDA dependencies

AI models are not bundled — they download automatically on first launch and cache in %APPDATA%\ScribeAir\models\.

Project Structure

scribe-air/
├── src/
│   ├── main.py                 # Application entry point
│   ├── config.py               # Configuration management
│   ├── hotkey.py               # Global hotkey detection
│   ├── recorder.py             # Audio recording (sounddevice)
│   ├── transcriber.py          # Whisper transcription
│   ├── gigaam_transcriber.py   # GigaAM ONNX transcription
│   ├── vosk_transcriber.py     # Vosk transcription
│   ├── streaming_pipeline.py   # Streaming VAD + transcription
│   ├── wakeword_listener.py    # Wake word detection (openWakeWord)
│   ├── term_replacer.py        # IT term replacement (Cyrillic → Latin)
│   ├── audio_processor.py      # Audio preprocessing
│   ├── text_corrector_t5.py    # T5 text correction
│   ├── model_downloader.py     # Model download with mirror fallback
│   ├── inserter.py             # Text insertion via clipboard
│   ├── overlay.py              # Floating transcription window
│   ├── tray.py                 # System tray UI
│   └── autostart.py            # Windows autostart
├── wakeword_data/models/       # Wake word ONNX models (zapis, stop)
├── tests/                      # Pytest test suite
├── docs/guides/                # User and developer guides
├── assets/icon.ico             # Application icon
├── requirements.txt            # Dependencies
├── voice_app.spec              # PyInstaller config
└── build.py                    # Build script

Troubleshooting

Model not loading — Check internet connection. First download is ~240 MB–1.6 GB from HuggingFace. Models cache in %APPDATA%\ScribeAir\models\
No audio — Verify microphone in Windows Sound Settings. Select device via Audio Device in tray menu
GPU not used — Install CUDA 12.x, update NVIDIA drivers. Set "device": "cuda" in config. Use CUDA build
Text not inserted — Ensure cursor is in a text field. Test with Notepad first
Debug logs — Right-click tray icon → “Show Console” for live logs. Log file: %APPDATA%\ScribeAir\scribe_air.log

Testing

pip install -r requirements-dev.txt
pytest tests/ -v

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE.

Acknowledgements

GigaAM — Russian ASR model (Sberbank, trained on 700K hours)
faster-whisper — Optimized Whisper implementation
onnx-asr — ONNX ASR inference
Vosk — Lightweight offline speech recognition
Silero VAD — Neural voice activity detection
bond005/ruT5-ASR-large — T5 Russian ASR correction