WebRTC has a reputation as a peer-to-peer technology — browsers connect directly, no server needed. That reputation is misleading.
Every real-world WebRTC session touches at least one WebRTC server. Most touch three or four. The phrase “peer-to-peer” describes how media flows after a connection is established, not what it takes to get there.
If you’re building a video call app, a live event platform, or any real-time communication feature with WebRTC, you need to understand WebRTC servers: what they are, what each type does, and how to choose the right architecture for your use case.
WebRTC achieves sub-500ms latency — far below streaming protocols like HLS or DASH. But getting there requires server infrastructure that most tutorials skip over entirely.
This guide covers all four types of WebRTC servers, how they interact, and how to decide which ones your project actually needs.
What Is a WebRTC Server?
A WebRTC server is any server-side component that supports WebRTC sessions — handling session negotiation, NAT traversal, media routing, or recording. Real WebRTC deployments don’t use a single monolithic server; they combine multiple specialized servers, each responsible for a different layer of the connection.
The four main types are:
- Signaling server — coordinates session setup between peers
- STUN server — helps peers identify their public IP addresses
- TURN server — relays media when direct connections fail
- Media server — routes, processes, or mixes streams for multi-party sessions
Each plays a distinct role in the connection lifecycle, from the initial handshake through media delivery.
Does WebRTC Really Need a Server?
Two peers on the same local network can technically connect without any server at all. In practice, that almost never describes your users.
Most users are behind NAT (Network Address Translation) — the mechanism home routers use to share a single public IP among multiple devices. NAT hides each device’s real address from the outside world, which blocks direct peer-to-peer connections.
STUN servers solve NAT for about 80% of connections by helping peers determine their public-facing addresses. The remaining 20% — typically users behind symmetric NAT, which is common in enterprise networks and some mobile connections — require a TURN relay server to get media through at all.
Before any media can flow, peers also need to exchange session descriptions, negotiate codecs, and share connectivity candidates. None of that happens automatically — that’s what a signaling server handles.
So: yes, WebRTC needs servers. The question is which servers and how many.
The Four Types of WebRTC Servers
Here’s a quick reference before covering each type in depth:
| Server Type | Role | Required? | Self-Host Complexity |
|---|---|---|---|
| Signaling server | Session negotiation and ICE candidate exchange | Always | Low–Medium |
| STUN server | Public IP/port discovery for NAT traversal | Always | Very low |
| TURN server | Relay fallback when direct P2P fails | ~20% of connections | Medium |
| Media server (SFU/MCU) | Multi-party routing, recording, transcoding | For 3+ participants or broadcasting | High |
Signaling Server
A signaling server is a WebRTC server that coordinates the initial session setup between peers. It carries SDP (Session Description Protocol) offers and answers — documents that describe each peer’s media capabilities, codecs, and network addresses — and relays ICE candidates: the potential network paths each peer can receive data on.
WebRTC deliberately does not define a signaling protocol. You can use WebSockets, HTTP long-polling, SIP, or any transport that works for your architecture. WebSockets are the most common choice because they keep a persistent bidirectional connection open, which makes real-time candidate exchange fast and avoids polling overhead.
A minimal signaling exchange in JavaScript looks like this:
// Peer A creates an offer and sends it via signaling
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
signalingSocket.send(JSON.stringify({ type: 'offer', sdp: offer }));
// Peer B receives the offer, creates an answer
signalingSocket.on('message', async (data) => {
const message = JSON.parse(data);
if (message.type === 'offer') {
await peerConnection.setRemoteDescription(message.sdp);
const answer = await peerConnection.createAnswer();
await peerConnection.setLocalDescription(answer);
signalingSocket.send(JSON.stringify({ type: 'answer', sdp: answer }));
}
});
// Both peers exchange ICE candidates as they are gathered (Trickle ICE)
peerConnection.onicecandidate = (event) => {
if (event.candidate) {
signalingSocket.send(JSON.stringify({ type: 'candidate', candidate: event.candidate }));
}
};
Once the SDP offer/answer exchange completes and ICE candidates are shared, the signaling server’s job is done. Media flows directly between peers (or through a TURN relay) — not through the signaling server.
For production, your signaling server needs to handle authentication, room management, reconnections, and participant presence. Node.js with Socket.io is the most common custom stack. Hosted WebSocket services (AWS API Gateway, Ably, Pusher) can carry signaling traffic without running your own server process.
STUN Server
A STUN server (Session Traversal Utilities for NAT) is a WebRTC server that tells each peer what its public IP address and port look like from outside the local network.
When a peer is behind NAT, its private IP (like 192.168.1.x) is invisible to the outside world. The STUN server receives the peer’s request and reads the source IP and port from the incoming packet headers — which reflect the public address assigned by the NAT device. It returns those values to the peer, which then includes this “server reflexive” candidate in the ICE exchange.
STUN is stateless and lightweight — the server does not stay in the media path after this lookup.
Several free public STUN servers exist, including Google’s widely-used stun.l.google.com:19302.
STUN works for full cone, restricted cone, and port-restricted cone NAT types. It fails for symmetric NAT, where the NAT assigns a different external port for each destination address. That’s where TURN comes in.
For most small-to-medium applications, using a free public STUN server is perfectly fine. Running your own STUN server only makes sense when you have strict privacy requirements, need guaranteed uptime SLAs, or want to avoid dependency on third-party infrastructure.
TURN Server
A TURN server (Traversal Using Relays Around NAT) is a WebRTC server that relays media between peers when a direct connection is impossible.
Unlike STUN, the TURN server stays in the media path permanently. Both peers send their audio and video to the TURN server, which forwards it to the other side. This adds latency and consumes bandwidth on the server — making TURN more expensive to run than STUN.
TURN handles every NAT type, including symmetric NAT, which is why it’s the fallback of last resort when everything else fails.
Port requirements for a TURN server:
– 3478 UDP (and TCP for TCP relay)
– 443 TCP (TURN over TLS)
– 10000–20000 UDP (media relay port range)
Coturn is the dominant open-source TURN server implementation. It’s mature, well-documented, and runs on any Linux server. A small VPS ($20–40/month) handles hundreds of concurrent relay sessions at typical video bitrates.
How ICE decides when to use TURN:
ICE tries connection candidates in priority order: direct host connections first, then STUN-resolved server reflexive addresses, then TURN relay addresses. The TURN relay is only used when the higher-priority candidates fail. Your users don’t notice the difference — ICE selects the best working path automatically.
ICE: How STUN and TURN Work Together
ICE (Interactive Connectivity Establishment) is the framework defined in EXTLINKRFC 8445 that coordinates STUN, TURN, and direct connections to find the best available path between peers.
Here’s how it works step by step:
- Each peer gathers connection “candidates” — possible network addresses it can receive data on
- Candidates come in three types: host (local IP/port), server reflexive (returned by STUN), and relay (allocated on a TURN server)
- Candidates are sent to the remote peer via the signaling server
- Both peers run connectivity checks on all candidate pairs
- ICE selects the highest-priority working pair and the connection is established
Trickle ICE sends candidates to the remote peer as they’re gathered rather than waiting for all gathering to complete. This cuts connection setup time from several seconds to under a second in most cases — critical for any real-time communication experience.
Your RTCPeerConnection configuration tells the browser where to find STUN and TURN:
const iceServers = [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:your-turn-server.example.com:3478',
username: 'user',
credential: 'secret'
}
];
const peerConnection = new RTCPeerConnection({ iceServers });
ICE also monitors the connection throughout the session. If the selected candidate pair degrades — for example, a user switches from Wi-Fi to cellular — ICE can switch to an alternate working pair without dropping the session.
WebRTC Media Server Architectures: Mesh, MCU, and SFU
For two-party calls, direct P2P with STUN/TURN is enough. For three or more participants — group calls, webinars, live broadcasts — you need a WebRTC media server to manage multiple streams.
There are three main architectures:
Mesh (P2P)
Every participant connects directly to every other participant. With N peers, each one maintains N−1 connections and uploads N−1 separate video streams.
This works for 2–4 participants on good connections but collapses quickly at scale. A 6-person call means each participant uploads 5 video streams — which drains mobile battery, consumes upload bandwidth, and pins CPU on the encoding step. Mesh has no server-side media cost but has hard practical limits.
Best for: 1:1 video calls, small 3–4 person calls on reliable connections.
MCU (Multipoint Control Unit)
All peers send their media to the MCU, which decodes every stream, mixes them into a single composite stream, and sends that composite to each participant. Each participant receives one stream regardless of how many others are in the call.
The upside: low client-side bandwidth and CPU. The downside: the MCU does all that decoding and re-encoding in real time, which is CPU-intensive on the server. MCU infrastructure costs EXTLINKat least 10× more than SFU at comparable participant counts. It also introduces additional latency from the transcoding step.
MCU is largely legacy architecture, most commonly seen in interoperability scenarios with older SIP or H.323 systems.
Best for: Legacy teleconferencing interop, strict single-stream client requirements.
SFU (Selective Forwarding Unit)
All peers send their media to the SFU, which forwards individual streams to each subscriber — without decoding or re-encoding. The SFU selects which streams to send to which participant based on subscriber preferences and bandwidth (hence “selective forwarding”).
SFU is the dominant modern architecture for group WebRTC because:
– Server CPU usage is low — no transcoding step
– Supports simulcast — clients send multiple quality renditions at once; the SFU picks the right one per subscriber based on their connection
– Scales to 50+ participants in a grid layout on average connections
– Compatible with end-to-end encryption, since the server never decodes streams
An SFU can also record streams (capturing them before forwarding) and handle video transcoding for HLS output, making it the backbone of many live broadcasting architectures.
Best for: Group calls, webinars, virtual events, live streaming to audiences.
Architecture Comparison
| Architecture | Server CPU | Client Upload | Max Participants | Transcoding | Best Use Case |
|---|---|---|---|---|---|
| Mesh (P2P) | None | High (N−1 streams) | 2–4 | No | 1:1 calls, small groups |
| MCU | Very high | Low (1 stream received) | ~30 | Yes | Legacy SIP/H.323 interop |
| SFU | Low | Medium | 50+ | Optional | Group calls, webinars, broadcasting |
Popular Open-Source WebRTC Server Options
| Server | Type | Language | Best For |
|---|---|---|---|
| Coturn | STUN/TURN | C | NAT traversal relay for any WebRTC app |
| LiveKit | SFU | Go | Modern group calls, AI voice agents |
| Mediasoup | SFU library | Node.js / Rust | Custom media server builds with fine control |
| Janus | Media gateway | C | Flexible plugin-based deployments |
| Jitsi Videobridge (JVB) | SFU | Java | Full conferencing stack with built-in UI |
| Kurento | Media server | C/C++ | Rich media pipelines, recording, transcoding |
Coturn is the go-to for STUN/TURN relay. It’s the most mature open-source implementation, widely deployed, and well-documented. Nearly every self-hosted WebRTC stack uses it.
LiveKit has grown fastest in recent years — its GitHub repository became the most-starred WebRTC SFU project in 2024–2025. It ships SDKs for web, iOS, Android, and Unity, and has strong support for AI real-time communication use cases (voice agents, real-time transcription).
Mediasoup is a library, not a standalone server. You build your media server logic on top of it using Node.js. It’s extremely high-performance and popular for teams that need precise control over media routing, but requires significantly more code than a batteries-included SFU.
Janus is a general-purpose WebRTC gateway with a plugin architecture. Plugins handle videoroom, streaming, recording, SIP gateway, and other features. More configuration overhead than LiveKit, but highly adaptable.
Jitsi is the oldest full-stack option. Jitsi Meet is a complete video conferencing application; Jitsi Videobridge is the SFU underneath. Good when you need a turnkey deployment with an existing web UI.
Which WebRTC Server Do You Actually Need?
Use this framework based on your specific use case:
1:1 video calls (2 participants only)
→ Signaling server + STUN server (required)
→ TURN server (strongly recommended — ~20% of sessions will need it)
→ No media server needed
Small group calls (3–6 participants)
→ Signaling + STUN + TURN + SFU
→ Mesh is an option if participants are on fast, reliable connections and you want zero server media cost
Large group calls or webinars (7+ participants)
→ Signaling + STUN + TURN + SFU
→ Configure simulcast to manage per-subscriber bandwidth adaptively
Live broadcasting (one-to-many)
→ Signaling + STUN + TURN + SFU with HLS output for viewer delivery
→ Review WebRTC vs. HLS tradeoffs — for large audiences, HLS delivery scales more cheaply than WebRTC
→ Consider whether RTMP or SRT gives your broadcasters better encoder compatibility than WebRTC ingest
Recording required
→ An SFU that supports recording (LiveKit, Janus), or a dedicated recording peer that subscribes as a participant and writes tracks to disk or cloud storage
Global audience
→ Geographically distributed TURN servers and SFU nodes — a single-region deployment adds 200–400ms of avoidable latency for remote users
Quick Decision Table
| Scenario | Signaling | STUN | TURN | Media Server |
|---|---|---|---|---|
| 1:1 video call | ✅ | ✅ | Recommended | ❌ |
| Group call (3–6) | ✅ | ✅ | ✅ | ✅ SFU |
| Webinar / conference | ✅ | ✅ | ✅ | ✅ SFU |
| Live broadcast to viewers | ✅ | ✅ | ✅ | ✅ SFU + HLS |
| Recording | ✅ | ✅ | ✅ | ✅ SFU with recording |
WebRTC Server Security
WebRTC mandates encryption. You cannot send unencrypted media through a WebRTC session — the specification forbids it.
DTLS (Datagram Transport Layer Security) handles key exchange. Before any media flows, peers complete a DTLS handshake over the connection established by ICE. This authenticates both sides and derives the encryption keys used to protect media packets.
SRTP (Secure Real-time Transport Protocol) carries the encrypted media. Every audio and video packet is protected using the keys negotiated in the DTLS handshake.
The minimum required DTLS version is 1.2. The required cipher suite is TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 with the P-256 curve — EXTLINKas mandated by the WebRTC security specification.
TURN servers do not break DTLS encryption. When media flows through a TURN relay, the server forwards encrypted SRTP packets without decrypting them. The DTLS session exists directly between the two WebRTC peers — the TURN server is a transparent relay that never sees unencrypted content.
For your signaling server, always use HTTPS and WSS (WebSocket Secure). Sending SDP descriptions or ICE candidates over an unencrypted channel exposes session setup to interception and manipulation.
Security Checklist
- [ ] DTLS 1.2+ enforced (modern browsers handle this automatically; verify in SDP negotiation logs)
- [ ] Signaling server runs over WSS, not plain WS
- [ ] TURN server configured with long-term credentials — never run an open relay
- [ ] TURN credentials rotated regularly (or use short-lived tokens)
- [ ] ICE candidate filtering enabled if user IP privacy is a requirement
- [ ] Firewall rules restrict TURN server access to the required port ranges only
Self-Hosted vs. Managed WebRTC Infrastructure
Building your own WebRTC server stack is doable — but it’s more work than most teams expect.
A minimal production-ready setup requires:
- A signaling server (Node.js + WebSocket, or a hosted WebSocket service)
- A STUN server (Google’s free public server works for most cases)
- A TURN server (coturn on a Linux VPS)
- An SFU (LiveKit, Mediasoup, Janus — depending on your requirements)
- Load balancing, TLS certificates, monitoring, and alerting for each component
At small scale, hardware costs are low: $20–120/month for TURN and signaling on a VPS. But the engineering time to build, configure, and maintain the full stack — plus handle edge cases like ICE failures, codec negotiation bugs, simulcast tuning, and traffic spikes — adds up considerably.
For teams building live streaming platforms rather than real-time conferencing tools, WebRTC may not be the right ingest protocol. Compare WebRTC vs. RTMP for broadcaster workflows: RTMP and SRT work with off-the-shelf hardware and software encoders (OBS, vMix, hardware encoders), while WebRTC ingest requires a browser or a custom SDK on the broadcaster side.
Many production live streaming pipelines use RTMP or SRT for ingest, convert to HLS for delivery, and skip the WebRTC media server entirely. This is a well-established pattern that scales to millions of concurrent viewers without the complexity of running an SFU at scale.
LiveAPI’s live streaming API handles this full pipeline: RTMP and SRT ingest, adaptive bitrate encoding, HLS output, and global delivery via CDN partners including Akamai, Cloudflare, and Fastly. Teams that need to launch a live video streaming platform can build on LiveAPI instead of provisioning and maintaining individual STUN, TURN, signaling, and SFU components — up to 4K streaming quality, with a pay-as-you-grow pricing model.
If your use case is real-time two-way communication (sub-500ms latency, bidirectional), a WebRTC stack is the right tool. If it’s one-to-many broadcasting where 2–10 seconds of latency is acceptable, an RTMP/SRT + HLS pipeline is simpler, cheaper, and more compatible with the hardware encoder ecosystem.
WebRTC Server FAQ
What is a WebRTC server?
A WebRTC server is any server-side component that supports WebRTC communication — signaling servers for session setup, STUN servers for NAT discovery, TURN servers for relay fallback, and media servers for multi-party routing or recording. Most real deployments use three or four types working together.
Does WebRTC work without a server?
Only on the same local network. Every real-world deployment needs at least a signaling server and a STUN server. About 20% of sessions also require a TURN relay server when direct peer-to-peer connections fail due to symmetric NAT.
What is the difference between a STUN server and a TURN server?
STUN tells each peer its public IP address so they can attempt a direct connection — it’s lightweight and stateless. TURN relays media through the server when a direct connection is impossible — it stays in the media path and consumes bandwidth proportional to the streams it carries.
What is ICE in WebRTC?
ICE (Interactive Connectivity Establishment) is the framework that coordinates STUN, TURN, and direct connections to find the best available network path between two peers. It gathers candidates (local, STUN-discovered, and TURN-allocated), runs connectivity checks on all pairs, and selects the best working route. Trickle ICE sends candidates as they’re found rather than waiting for all gathering to complete, cutting setup time.
What is a WebRTC SFU?
An SFU (Selective Forwarding Unit) is a media server that receives individual streams from each participant and forwards them to other participants — without decoding or re-encoding. It’s the standard architecture for group video calls because it uses far less server CPU than an MCU and supports simulcast for adaptive quality delivery.
What is the difference between an SFU and an MCU in WebRTC?
An SFU forwards individual streams without processing them. An MCU decodes all streams, mixes them into one composite video, and sends that composite to each participant. SFU uses much less server CPU, scales to more participants, and costs less to run. MCU is simpler for clients (they receive one stream) but is expensive and adds latency.
What is the best open-source WebRTC server?
It depends on your use case. For STUN/TURN: coturn. For a modern SFU with good SDKs: LiveKit. For a custom media server built in Node.js: Mediasoup. For a full conferencing stack with a built-in web UI: Jitsi. For a flexible plugin-based media gateway: Janus.
Do I need a TURN server?
If any of your users are behind symmetric NAT — common in enterprise networks and some mobile carriers — then yes: without TURN, those sessions will fail to connect entirely. STUN handles roughly 80% of connections. TURN covers the rest.
How many participants can WebRTC support?
With mesh/P2P architecture, 2–4 participants before bandwidth and CPU become limiting factors. With an SFU, 50+ participants in a group call is common. For one-to-many broadcasting to large audiences, HLS/DASH delivery is more appropriate than WebRTC.
Is WebRTC encrypted?
Yes, always. WebRTC requires DTLS for key exchange and SRTP for media encryption. Unencrypted media transmission is explicitly forbidden by the specification. The minimum required DTLS version is 1.2.
Closing
A WebRTC server isn’t a single thing — it’s a stack of specialized components, each handling a different layer of the connection. Signaling servers manage session setup. STUN servers handle NAT discovery. TURN servers provide relay fallback. Media servers route streams for multi-party sessions and broadcasting.
For most real-world deployments, you’ll need at least a signaling server, a STUN server, and a TURN server. Add an SFU when you need more than two participants, recording, or HLS output. Add managed CDN delivery when your audience scales past what a WebRTC server can handle directly.
If you’re building a live streaming application and want to skip the infrastructure complexity, Get started with LiveAPI — RTMP and SRT ingest, adaptive bitrate encoding, HLS delivery, and global CDN in one API.

