Skip to content

Introduction ​

RTCstack is a self-hosted, modular, developer-first real-time communication platform built on top of LiveKit.

What makes it different ​

Most WebRTC solutions give you video and audio. RTCstack adds live transcription as a first-class feature β€” not an add-on, not an external API call. Whisper runs in the same Docker stack. Transcripts appear in the UI automatically.

CapabilityRTCstackBare LiveKitJitsiSaaS SDKs
Self-hostedβœ…βœ…βœ…βŒ
TypeScript SDKβœ…PartialβŒβœ…
Live transcription (local)βœ…βŒβŒβŒ
Post-call transcriptionβœ…βŒβŒPaid
Composable UI kitβœ…βŒβŒβœ…
Docker one-command deployβœ…Partialβœ…β€”

Core features ​

Live transcription ​

Per-speaker transcription happens in real time during the call:

  • A speaking indicator (Β·Β·Β·) appears the moment someone's mic becomes active
  • Whisper transcribes the audio and the final text replaces the indicator (typically 1–3 seconds after speaking)
  • Speaker attribution is automatic β€” each participant gets their own labelled bubble
  • All results are exposed via SDK events (transcriptReceived, speakingStarted, speakingStopped) and pre-built UI components (TranscriptPanel)

Post-call transcription ​

Trigger transcription on any recording after the call ends. Results are timestamped, speaker-attributed, and stored in Redis β€” retrievable via API.

Video conferencing ​

Full multi-party video/audio with screen sharing, chat, reactions, participant management, layout switching, and device selection.

Recording ​

Egress-based recording (via LiveKit Egress) to MinIO/S3. Start and stop via API or SDK.

Design Principles ​

  • Transcription-first β€” STT is built into the stack, not bolted on afterward
  • SDK-first, UI optional β€” the SDK works standalone; UI kits are additive
  • Composable UI β€” use <VideoConference showTranscript /> or compose from atomic components
  • Thin backend abstraction β€” the API is stateless and stays out of the media path
  • No data leaves your server β€” Whisper runs locally; zero external API dependencies

Architecture Overview ​

Your Frontend (SDK + UI Kit)
         ↓
Your App Backend
         ↓  (HMAC-signed)
RTCstack API  ──POST /v1/token──►  LiveKit JWT
         ↓
LiveKit SFU + coturn TURN
         β”‚
         β”œβ”€β”€ Egress ──► MinIO (recordings)
         β”‚                   β”‚
         β”‚              stt-worker ──► Whisper
         β”‚
         └── stt-live-agent ──► Whisper
                    β”‚
              publish_data() β†’ SDK β†’ UI

Transcription path:

ModeWho triggersData path
LiveAPI call during the roomLiveKit audio β†’ stt-live-agent β†’ Whisper β†’ LiveKit data channel β†’ SDK event
Post-callAPI call after the recordingMinIO recording β†’ stt-worker β†’ Whisper β†’ Redis β†’ API poll

Roles ​

RoleCan publishCan mute othersRoom admin
hostβœ…βœ…βœ…
moderatorβœ…βœ…βŒ
participantβœ…βŒβŒ
viewer❌❌❌