Introduction

RTCstack is a self-hosted, modular, developer-first real-time communication platform built on top of LiveKit.

What makes it different

Most WebRTC solutions give you video and audio. RTCstack adds live transcription as a first-class feature — not an add-on, not an external API call. Whisper runs in the same Docker stack. Transcripts appear in the UI automatically.

Capability	RTCstack	Bare LiveKit	Jitsi	SaaS SDKs
Self-hosted	✅	✅	✅	❌
TypeScript SDK	✅	Partial	❌	✅
Live transcription (local)	✅	❌	❌	❌
Post-call transcription	✅	❌	❌	Paid
Composable UI kit	✅	❌	❌	✅
Docker one-command deploy	✅	Partial	✅	—

Core features

Live transcription

Per-speaker transcription happens in real time during the call:

A speaking indicator (···) appears the moment someone's mic becomes active
Whisper transcribes the audio and the final text replaces the indicator (typically 1–3 seconds after speaking)
Speaker attribution is automatic — each participant gets their own labelled bubble
All results are exposed via SDK events (transcriptReceived, speakingStarted, speakingStopped) and pre-built UI components (TranscriptPanel)

Post-call transcription

Trigger transcription on any recording after the call ends. Results are timestamped, speaker-attributed, and stored in Redis — retrievable via API.

Video conferencing

Full multi-party video/audio with screen sharing, chat, reactions, participant management, layout switching, and device selection.

Recording

Egress-based recording (via LiveKit Egress) to MinIO/S3. Start and stop via API or SDK.

Design Principles

Transcription-first — STT is built into the stack, not bolted on afterward
SDK-first, UI optional — the SDK works standalone; UI kits are additive
Composable UI — use <VideoConference showTranscript /> or compose from atomic components
Thin backend abstraction — the API is stateless and stays out of the media path
No data leaves your server — Whisper runs locally; zero external API dependencies

Architecture Overview

Your Frontend (SDK + UI Kit)
         ↓
Your App Backend
         ↓  (HMAC-signed)
RTCstack API  ──POST /v1/token──►  LiveKit JWT
         ↓
LiveKit SFU + coturn TURN
         │
         ├── Egress ──► MinIO (recordings)
         │                   │
         │              stt-worker ──► Whisper
         │
         └── stt-live-agent ──► Whisper
                    │
              publish_data() → SDK → UI

Transcription path:

Mode	Who triggers	Data path
Live	API call during the room	LiveKit audio → stt-live-agent → Whisper → LiveKit data channel → SDK event
Post-call	API call after the recording	MinIO recording → stt-worker → Whisper → Redis → API poll

Roles

Role	Can publish	Can mute others	Room admin
host	✅	✅	✅
moderator	✅	✅	❌
participant	✅	❌	❌
viewer	❌	❌	❌

Introduction ​

What makes it different ​

Core features ​

Live transcription ​

Post-call transcription ​

Video conferencing ​

Recording ​

Design Principles ​

Architecture Overview ​

Roles ​