RTCstack

🎙

Live Transcription — built in

Per-speaker, real-time captions powered by Whisper running inside your own stack. A speaking indicator appears the moment someone's mic is active. Transcribed text replaces it 1–3 seconds later. No API keys, no external service, no data leaves your server.

🎥

HD Video & Audio

Multi-party video conferencing with adaptive bitrate, simulcast, and selective forwarding — all via LiveKit's battle-tested SFU. Supports thousands of participants with the right hardware.

🖥

Screen Sharing

Any participant can share their screen alongside the video call. Screen share tiles appear automatically in the grid. One-click toggle from the control bar.

💬

In-call Chat

Real-time text chat alongside the video — send public messages or DMs to specific participants. Chat history persists for the duration of the session.

⚡

Emoji Reactions

Participants can send emoji reactions that float across the screen. Lightweight, fun, and fully delivered over LiveKit's data channel.

🔴

Recording

Start and stop recording via API or SDK. Recordings go straight to your MinIO/S3 bucket via LiveKit Egress. Combine with post-call transcription for fully indexed archives.

📄

Post-call Transcription

Trigger transcription on any recording after the call ends. Timestamped, speaker-attributed segments stored in Redis and retrievable via API. CPU or GPU Whisper.

👥

Participant Management

Role-based access (host, moderator, participant, viewer). Hosts can mute participants, remove them from the room, or demote them. Room metadata and webhook events for every state change.

🎛

Device Selection

Users can switch microphone, speaker, and camera mid-call. Devices are enumerated from the browser and swapped without dropping the connection.

🧩

Composable UI Kits

Drop in `` for a complete experience, or compose from atomic pieces — ``, ``, ``. React, Vue 3, and Vanilla JS.

⚙

SDK-first

Framework-agnostic TypeScript SDK wraps LiveKit. Every feature has an event — `transcriptReceived`, `speakingStarted`, `messageReceived`, `recordingStarted`. Under 80 kB gzipped.

🔐

Secure by default

HMAC-signed API requests, short-lived JWTs, role-scoped token grants, replay-attack prevention. No plaintext secrets. Everything self-hosted.

What is RTCstack?

RTCstack is a self-hosted real-time communication platform that gives you everything you'd normally pay a SaaS provider for — without sending a byte of audio, video, or transcript to anyone else.

It wraps LiveKit (an open-source WebRTC SFU) with:

A thin REST API for token signing, room/participant/recording/transcription management

A TypeScript SDK (createCall() → connect() → events) that abstracts the entire session

Live and post-call transcription via faster-whisper — included, not optional

UI kits for React, Vue 3, and Vanilla JS — every component is usable standalone

bash

# Start the full stack including live transcription
cd docker && docker compose --profile stt-live up

# Request a token
POST /v1/token  →  { token, url }

# Connect and display everything in React

tsx

import { createCall } from '@rtcstack/sdk'
import { VideoConference } from '@rtcstack/ui-react'

const call = createCall({ token, url })
await call.connect()

// Full conference: video, controls, chat, screen share, reactions, transcription
<VideoConference call={call} showTranscript />

// Or wire up every feature from raw SDK events
call.on('participantJoined',    (p) => console.log(p.name, 'joined'))
call.on('messageReceived',      (m) => appendChat(m.fromName, m.text))
call.on('speakingStarted',      (id, name) => showIndicator(name))
call.on('transcriptReceived',   (seg) => appendTranscript(seg.speaker, seg.text))
call.on('recordingStarted',     () => showRecordingBadge())
call.on('screenShareStarted',   (p) => showScreenTile(p))
call.on('activeSpeakerChanged', (speakers) => highlightSpeaker(speakers[0]))

Why not just use LiveKit directly?

LiveKit is excellent but low-level. You still need to:

Write a token server (and secure it)

Handle room/participant metadata and webhooks

Build media controls, chat, screen share, device selection, reactions

Add transcription (LiveKit has no STT — that's entirely on you)

Build and style all the UI

RTCstack does all of that. You focus on your product.

RTCstackDrop-in real-time communication