Transcription Overview โ
Live and post-call transcription are core features of RTCstack โ not an external API, not a paid add-on. Whisper runs inside your Docker stack alongside everything else. No audio ever leaves your server.
Two modes โ
| Live | Post-call | |
|---|---|---|
| When | During the call | After the recording ends |
| Latency | 1โ3 seconds per utterance | Minutes (whole file at once) |
| Output | SDK events โ UI in real time | Redis segments + plain text |
| Trigger | API call to start/stop | API call on a recording ID |
| Speaker attribution | Yes (per audio track) | Yes (via diarization or speaker tags) |
| Use case | Live captions, meeting notes, accessibility | Archives, searchable transcripts |
Both modes share the same Whisper service and are independently enabled via environment variables.
How live transcription works โ
Participant speaks
โ
stt-live-agent detects audio energy (RMS)
โ (immediately)
โ speakingStarted event in SDK โ "ยทยทยท" in UI
โ
1.5s silence โ audio chunk flushed to Whisper
โ (~0.5โ2s)
Whisper returns text
โ
โ transcriptReceived event in SDK โ final text in UIThe speaking indicator fires before Whisper is involved โ it's just RMS energy detection. The real-time feel comes from showing the indicator immediately, then replacing it with transcribed text once Whisper responds.
SDK events โ
typescript
// Speaking indicator โ fires immediately when mic energy is detected
call.on('speakingStarted', (speakerId: string, speakerName: string) => {
showTypingIndicator(speakerName)
})
// Fired when transcript arrives (also clears the speaking indicator)
call.on('transcriptReceived', (segment: TranscriptSegment) => {
appendTranscript(segment.speaker, segment.text)
})
// Speaking indicator cleared
call.on('speakingStopped', (speakerId: string) => {
removeTypingIndicator(speakerId)
})Built-in UI components โ
React โ
tsx
import { TranscriptPanel } from '@rtcstack/ui-react'
// Standalone
<TranscriptPanel maxItems={100} showSpeakerName />
// Or built into VideoConference with one prop
<VideoConference call={call} showTranscript />Vue 3 โ
vue
<TranscriptPanel :max-items="100" :show-speaker-name="true" />
<!-- Or -->
<VideoConference :call="call" :show-transcript="true" />Vanilla JS โ
typescript
mountVideoConference(el, {
call,
showTranscript: true, // default: true
})Custom transcript rendering (any framework) โ
Use the SDK events directly and build your own UI:
typescript
call.on('speakingStarted', (speakerId, name) => {
// Create a "typing" bubble for this speaker
})
call.on('transcriptReceived', ({ speaker, speakerId, text, timestamp }) => {
// Replace the bubble with real text, or append to existing
})Requirements โ
- Docker Compose with the
stt-liveprofile enabled - Whisper model weights (~150 MB for
base) โ downloaded automatically on first start TRANSCRIPTION_LIVE_ENABLED=trueindocker/.env
See Live Transcription Setup for the full configuration guide.

