Transcription Overview

Live and post-call transcription are core features of RTCstack — not an external API, not a paid add-on. Whisper runs inside your Docker stack alongside everything else. No audio ever leaves your server.

Two modes

	Live	Post-call
When	During the call	After the recording ends
Latency	1–3 seconds per utterance	Minutes (whole file at once)
Output	SDK events → UI in real time	Redis segments + plain text
Trigger	API call to start/stop	API call on a recording ID
Speaker attribution	Yes (per audio track)	Yes (via diarization or speaker tags)
Use case	Live captions, meeting notes, accessibility	Archives, searchable transcripts

Both modes share the same Whisper service and are independently enabled via environment variables.

How live transcription works

Participant speaks
       ↓
stt-live-agent detects audio energy (RMS)
       ↓ (immediately)
→ speakingStarted event in SDK  →  "···" in UI
       ↓
1.5s silence → audio chunk flushed to Whisper
       ↓ (~0.5–2s)
Whisper returns text
       ↓
→ transcriptReceived event in SDK  →  final text in UI

The speaking indicator fires before Whisper is involved — it's just RMS energy detection. The real-time feel comes from showing the indicator immediately, then replacing it with transcribed text once Whisper responds.

SDK events

typescript

// Speaking indicator — fires immediately when mic energy is detected
call.on('speakingStarted', (speakerId: string, speakerName: string) => {
  showTypingIndicator(speakerName)
})

// Fired when transcript arrives (also clears the speaking indicator)
call.on('transcriptReceived', (segment: TranscriptSegment) => {
  appendTranscript(segment.speaker, segment.text)
})

// Speaking indicator cleared
call.on('speakingStopped', (speakerId: string) => {
  removeTypingIndicator(speakerId)
})

Built-in UI components

React

tsx

import { TranscriptPanel } from '@rtcstack/ui-react'

// Standalone
<TranscriptPanel maxItems={100} showSpeakerName />

// Or built into VideoConference with one prop
<VideoConference call={call} showTranscript />

Vue 3

vue

<TranscriptPanel :max-items="100" :show-speaker-name="true" />

<!-- Or -->
<VideoConference :call="call" :show-transcript="true" />

Vanilla JS

typescript

mountVideoConference(el, {
  call,
  showTranscript: true,  // default: true
})

Custom transcript rendering (any framework)

Use the SDK events directly and build your own UI:

typescript

call.on('speakingStarted', (speakerId, name) => {
  // Create a "typing" bubble for this speaker
})

call.on('transcriptReceived', ({ speaker, speakerId, text, timestamp }) => {
  // Replace the bubble with real text, or append to existing
})

Requirements

Docker Compose with the stt-live profile enabled
Whisper model weights (~150 MB for base) — downloaded automatically on first start
TRANSCRIPTION_LIVE_ENABLED=true in docker/.env

See Live Transcription Setup for the full configuration guide.

Transcription Overview ​

Two modes ​

How live transcription works ​

SDK events ​

Built-in UI components ​

React ​

Vue 3 ​

Vanilla JS ​

Custom transcript rendering (any framework) ​

Requirements ​

Transcription Overview

Two modes

How live transcription works

SDK events

Built-in UI components

React

Vue 3

Vanilla JS

Custom transcript rendering (any framework)

Requirements