README.md

ScribeAir

Offline voice input for Windows — push-to-talk and voice activation

Speak — text appears at cursor. Hold a hotkey or say “record” to start. Works in any application.

License: MIT Platform Python 3.10+ Release

Download | Features | Benchmarks | Installation | Русский


GigaAM achieves 3.3% WER on Russian (CPU) — outperforms every Whisper model, even on an RTX 4090 GPU. Fully offline after initial model download. Free and open source.

Why ScribeAir?

  • Offline — works without internet (Windows Voice Typing, Google, Dragon require cloud)
  • Best Russian quality — 3.3% WER vs ~25% Windows, ~10% Google, ~8% Dragon
  • Free & open source — MIT license (Dragon costs $300+, Google charges per request)
  • Configurable — 3 ASR backends to choose from (GigaAM, Whisper, Vosk)
  • Private — 100% local, your speech never leaves your computer

Download

Pre-built Windows executables are available in Releases — no Python required.

Extract the ZIP and run ScribeAir.exe. Models download automatically on first launch.

Features

  • Push-to-talk with configurable hotkeys (LShift+RShift, Win+Shift, etc.)
  • Wake word activation — say “запись” / “record” to start, “стоп” / “stop” to finish (no hotkey needed)
  • 3 ASR backends: Whisper (GPU/CPU), GigaAM (ONNX, Russian-optimized), Vosk (lightweight)
  • Progressive transcription — see intermediate results in real-time during recording
  • IT term replacement — automatic питон→Python, гугл→Google (81 terms with morphology)
  • Real-time overlay showing recording/transcription progress
  • Streaming pipeline with Silero VAD for instant voice detection
  • Fully offline after initial model download
  • Multi-language: Russian, English, auto-detect, mixed RU+EN, RU→EN translation
  • T5 text correction for fixing ASR errors in Russian
  • System tray UI with full settings menu
  • GPU acceleration via NVIDIA CUDA (auto CPU fallback)
  • Dual builds — CUDA (~4.9 GB) and CPU-only (~800 MB)
  • Custom vocabulary for domain-specific terms
  • Windows autostart support

Architecture

Wake Word (OWW)  ──→  Recording  ──→  Progressive GigaAM (live preview)
    or Hotkey              │                    │
                           ▼                    ▼ "стоп/stop"
                     Silero VAD  →  ASR Engine  →  Term Replacement  →  Clipboard
                                    ├── GigaAM  (Russian, CPU)
                                    ├── Whisper  (multilingual, GPU/CPU)
                                    └── Vosk     (lightweight, CPU)

Recognition Quality

Benchmarked on Russian audiobook data (detailed benchmarks):

Backend    Model              WER     Reduction   Latency
─────────  ─────────────────  ──────  ──────────  ──────────
GigaAM     v3-e2e-rnnt        3.3%   90.0%       0.66s (CPU)
GigaAM     v3-rnnt            3.3%   90.0%       0.82s (CPU)
GigaAM     v3-e2e-ctc         4.2%   87.2%       1.08s (CPU)
Whisper    large-v3-turbo     7.9%   75.7%       0.44s (GPU)
Whisper    large-v3           8.8%   72.9%       2.30s (GPU)
Whisper    medium            10.7%   67.2%       1.75s (GPU)
Vosk       small-ru          13.0%   60.0%       0.75s (CPU)
Whisper    base (baseline)   32.6%   —           0.42s (CPU)

Key finding: GigaAM on CPU (3.3% WER) outperforms Whisper large-v3-turbo on RTX 4090 GPU (7.9% WER) for Russian.

Installation

From source

Requirements: Windows 10/11, Python 3.10+, 8 GB RAM, microphone.

git clone https://github.com/borisovai/scribe-air.git
cd scribe-air

python -m venv venv
venv\Scripts\Activate.ps1

pip install -r requirements.txt
venv\Scripts\python.exe src\main.py

Models download automatically on first run (~240 MB for GigaAM, ~1.6 GB for Whisper).

Usage

  1. Launch the application — a microphone icon appears in the system tray
  2. Wait for model loading (icon turns from blue to gray)
  3. Start recording using either method:
    • Hotkey — hold LShift + RShift (configurable), release to finish
    • Voice — say “запись” / “record” to start, “стоп” / “stop” to finish
  4. Speak into your microphone
  5. Text is inserted at the cursor position

Voice Activation (Wake Word)

Enable “Wake Word” in the tray menu for hands-free operation — no hotkey needed, fully voice-controlled.

Tray Menu (right-click)

  • Language — Auto-detect, Russian, English, RU+EN Mixed, RU→EN Translate
  • Model — Tiny, Small, Medium, Large v3, Large v3 Turbo
  • ASR Backend — Auto, Whisper, GigaAM, Vosk
  • GigaAM Model — v2-ctc, v3-rnnt, v3-e2e-rnnt, etc.
  • Hotkey — Key combination for push-to-talk
  • Audio Device — Select input microphone
  • Wake Word — Toggle voice activation (say “запись”/“record” to start)
  • IT Term Replacement — Toggle automatic Cyrillic → Latin term conversion
  • Text Correction (T5) — Toggle T5 spell correction
  • Start with Windows — Toggle autostart
  • Show/Hide Console — Open a debug console with live logs

Configuration

Settings are stored in %APPDATA%\ScribeAir\config.json:

{
  "language_mode": "auto",
  "model": "large-v3-turbo",
  "device": "cuda",
  "hotkey": "shift+shift",
  "asr_backend": "auto",
  "gigaam_model": "v3-e2e-rnnt",
  "llm_correction_enabled": true,
  "custom_vocabulary": ["Kubernetes", "PostgreSQL"]
}

ASR Backend Modes

  • auto — Default: GigaAM for Russian on CPU, Whisper on GPU (3.3–7.9% WER, 0.4–0.8s)
  • whisper — English or GPU-accelerated transcription (7.9% WER, 0.44s on GPU)
  • gigaam — Russian on CPU, best quality (3.3% WER, 0.66s)
  • vosk — Ultra-low latency, short phrases (13% WER, 0.7s)

Building EXE

# Build both variants
venv\Scripts\python.exe build.py

# Build only CPU or CUDA
venv\Scripts\python.exe build.py cpu
venv\Scripts\python.exe build.py cuda
  • CUDA (~4.9 GB) — Full GPU support, includes NVIDIA DLLs
  • CPU (~800 MB) — CPU-only, no CUDA dependencies

AI models are not bundled — they download automatically on first launch and cache in %APPDATA%\ScribeAir\models\.

Project Structure

scribe-air/
├── src/
│   ├── main.py                 # Application entry point
│   ├── config.py               # Configuration management
│   ├── hotkey.py               # Global hotkey detection
│   ├── recorder.py             # Audio recording (sounddevice)
│   ├── transcriber.py          # Whisper transcription
│   ├── gigaam_transcriber.py   # GigaAM ONNX transcription
│   ├── vosk_transcriber.py     # Vosk transcription
│   ├── streaming_pipeline.py   # Streaming VAD + transcription
│   ├── wakeword_listener.py    # Wake word detection (openWakeWord)
│   ├── term_replacer.py        # IT term replacement (Cyrillic → Latin)
│   ├── audio_processor.py      # Audio preprocessing
│   ├── text_corrector_t5.py    # T5 text correction
│   ├── model_downloader.py     # Model download with mirror fallback
│   ├── inserter.py             # Text insertion via clipboard
│   ├── overlay.py              # Floating transcription window
│   ├── tray.py                 # System tray UI
│   └── autostart.py            # Windows autostart
├── wakeword_data/models/       # Wake word ONNX models (zapis, stop)
├── tests/                      # Pytest test suite
├── docs/guides/                # User and developer guides
├── assets/icon.ico             # Application icon
├── requirements.txt            # Dependencies
├── voice_app.spec              # PyInstaller config
└── build.py                    # Build script

Troubleshooting

  • Model not loading — Check internet connection. First download is ~240 MB–1.6 GB from HuggingFace. Models cache in %APPDATA%\ScribeAir\models\
  • No audio — Verify microphone in Windows Sound Settings. Select device via Audio Device in tray menu
  • GPU not used — Install CUDA 12.x, update NVIDIA drivers. Set "device": "cuda" in config. Use CUDA build
  • Text not inserted — Ensure cursor is in a text field. Test with Notepad first
  • Debug logs — Right-click tray icon → “Show Console” for live logs. Log file: %APPDATA%\ScribeAir\scribe_air.log

Testing

pip install -r requirements-dev.txt
pytest tests/ -v

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE.

Acknowledgements

Описание
Голосовой ввод текста
Конвейеры
0 успешных
0 с ошибкой
Разработчики