Skip to content

Transcription Deployment

Quick start — same machine (CPU)

bash
# docker/.env
TRANSCRIPTION_LIVE_ENABLED=true
TRANSCRIPTION_POST_ENABLED=true
WHISPER_MODEL=base
WHISPER_DEVICE=cpu
WHISPER_COMPUTE_TYPE=int8   # 2–3x faster than float32 on CPU

# Start STT profiles alongside the core stack
docker compose --profile stt-post --profile stt-live up -d

Model weights (~150 MB for base) are downloaded on first start and cached in the whisper_models Docker volume.

Whisper model selection

ModelSizeCPU speedGPU speedRecommended for
tiny75 MBFastestDev/testing only
base150 MBFastDefault — CPU deployments
small250 MBModerateFastBetter accuracy on CPU
medium770 MBSlowVery fastGPU deployments
large-v31.5 GBVery slowExcellentHigh-accuracy GPU

Use .en variants (base.en, small.en) for English-only — about 10% faster and more accurate.

Set in docker/.env:

dotenv
WHISPER_MODEL=base.en

Live transcription latency

Perceived delay = PAUSE_THRESHOLD_SECONDS + Whisper processing time.

HardwareModelWhisper processingTotal perceived delay
CPU 2–4 corestiny.en~2–3s4–5s
CPU 4–8 coresbase.en~1–2s3–4s
CPU 8+ coressmall.en~1.5–2.5s3–4s
GPU 6–8 GB (RTX 3060, T4)medium~0.3–0.7s2–2.5s
GPU 16–24 GB (RTX 3090/4090, A10)large-v3~0.2–0.5s2s

TIP

PAUSE_THRESHOLD_SECONDS=1.5 means the agent waits 1.5s of silence before sending audio to Whisper. The speaking indicator appears immediately (before Whisper is involved), so the UI never feels frozen even with longer processing times.

GPU acceleration (same machine)

Requirements:

bash
# docker/.env
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16

# Start with GPU override file
docker compose \
  -f docker-compose.yml \
  -f docker-compose.stt.gpu.yml \
  --profile stt-post --profile stt-live \
  up -d

Dedicated GPU machine

Run Whisper and the STT workers on a separate machine — useful when your main server is CPU-only but you have a GPU workstation or cloud instance available.

On the STT machine

bash
cd docker
cp .env.stt.example .env.stt
# Edit .env.stt — set REDIS_URL, MINIO_ENDPOINT, LIVEKIT_URL, credentials
nano .env.stt

# CPU
docker compose -f docker-compose.stt.yml --env-file .env.stt up -d

# GPU
docker compose \
  -f docker-compose.stt.yml \
  -f docker-compose.stt.gpu.yml \
  --env-file .env.stt up -d

On the main machine

dotenv
# docker/.env
WHISPER_URL=http://STT_MACHINE_IP:3281
TRANSCRIPTION_POST_ENABLED=true
TRANSCRIPTION_LIVE_ENABLED=true
bash
docker compose up -d api

The stt-worker and stt-live-agent services run on the STT machine. The main API just points its WHISPER_URL at the remote service.

Secure the connection

Redis and MinIO must never be exposed on a public IP without encryption. Use one of:

  • WireGuard or Tailscale (recommended) — private encrypted tunnel
  • SSH tunnelssh -L 6379:localhost:6379 user@main-host
  • Cloud VPC — security groups allowing only the STT machine's private IP

Environment variables reference

VariableDefaultDescription
TRANSCRIPTION_LIVE_ENABLEDfalseEnable live transcription API endpoints
TRANSCRIPTION_POST_ENABLEDfalseEnable post-call transcription API endpoints
WHISPER_MODELbaseModel: tiny, base, small, medium, large-v3
WHISPER_DEVICEcpucpu or cuda
WHISPER_COMPUTE_TYPEint8int8 (CPU) or float16 (GPU)
STT_LANGUAGEenISO 639-1 code or auto for detection
PAUSE_THRESHOLD_SECONDS1.5Silence before flushing to Whisper
SHORT_PAUSE_SECONDS0.3Short pause — only flush if last chunk ended with punctuation
MAX_CHUNK_SECONDS30.0Force-flush audio after this duration
SPEECH_RMS_THRESHOLD200RMS energy threshold to detect speech
WHISPER_MAX_CONCURRENT2Max parallel Whisper requests per room
WHISPER_URLhttp://whisper:8080URL for the Whisper REST service
PORT_WHISPER3281Host port for the Whisper REST API