Introduction β
RTCstack is a self-hosted, modular, developer-first real-time communication platform built on top of LiveKit.
What makes it different β
Most WebRTC solutions give you video and audio. RTCstack adds live transcription as a first-class feature β not an add-on, not an external API call. Whisper runs in the same Docker stack. Transcripts appear in the UI automatically.
| Capability | RTCstack | Bare LiveKit | Jitsi | SaaS SDKs |
|---|---|---|---|---|
| Self-hosted | β | β | β | β |
| TypeScript SDK | β | Partial | β | β |
| Live transcription (local) | β | β | β | β |
| Post-call transcription | β | β | β | Paid |
| Composable UI kit | β | β | β | β |
| Docker one-command deploy | β | Partial | β | β |
Core features β
Live transcription β
Per-speaker transcription happens in real time during the call:
- A speaking indicator (
Β·Β·Β·) appears the moment someone's mic becomes active - Whisper transcribes the audio and the final text replaces the indicator (typically 1β3 seconds after speaking)
- Speaker attribution is automatic β each participant gets their own labelled bubble
- All results are exposed via SDK events (
transcriptReceived,speakingStarted,speakingStopped) and pre-built UI components (TranscriptPanel)
Post-call transcription β
Trigger transcription on any recording after the call ends. Results are timestamped, speaker-attributed, and stored in Redis β retrievable via API.
Video conferencing β
Full multi-party video/audio with screen sharing, chat, reactions, participant management, layout switching, and device selection.
Recording β
Egress-based recording (via LiveKit Egress) to MinIO/S3. Start and stop via API or SDK.
Design Principles β
- Transcription-first β STT is built into the stack, not bolted on afterward
- SDK-first, UI optional β the SDK works standalone; UI kits are additive
- Composable UI β use
<VideoConference showTranscript />or compose from atomic components - Thin backend abstraction β the API is stateless and stays out of the media path
- No data leaves your server β Whisper runs locally; zero external API dependencies
Architecture Overview β
Your Frontend (SDK + UI Kit)
β
Your App Backend
β (HMAC-signed)
RTCstack API ββPOST /v1/tokenβββΊ LiveKit JWT
β
LiveKit SFU + coturn TURN
β
βββ Egress βββΊ MinIO (recordings)
β β
β stt-worker βββΊ Whisper
β
βββ stt-live-agent βββΊ Whisper
β
publish_data() β SDK β UITranscription path:
| Mode | Who triggers | Data path |
|---|---|---|
| Live | API call during the room | LiveKit audio β stt-live-agent β Whisper β LiveKit data channel β SDK event |
| Post-call | API call after the recording | MinIO recording β stt-worker β Whisper β Redis β API poll |
Roles β
| Role | Can publish | Can mute others | Room admin |
|---|---|---|---|
| host | β | β | β |
| moderator | β | β | β |
| participant | β | β | β |
| viewer | β | β | β |

