A sports fan watching a live match from home finds out the score from a push notification — before they see the goal on their stream. An auction bidder places what they think is a winning bid, only to discover the item sold three seconds ago. A poker player sees their opponent’s reaction before the hand they’re reacting to even appears on screen.
These aren’t edge cases. They’re what happens when you ship a live streaming application without understanding latency.
Ultra low latency video streaming is the technology that closes this gap — reducing the delay between a live event and a viewer’s screen from the typical 15-30 seconds down to under a second. For an entire category of applications, this isn’t a nice-to-have. It’s the product.
This guide covers everything you need to know: what ultra low latency streaming is, how the major protocols compare, which techniques actually move the needle, and how to implement low-latency infrastructure using a managed API.
What Is Ultra Low Latency Video Streaming?
Ultra low latency video streaming is a live video delivery approach that minimizes end-to-end delay to under one second — typically 200–500ms — between the moment video is captured and when it plays on a viewer’s device.
To put this in context, here’s how different latency ranges are typically categorized:
| Latency Range | Category | Typical Protocols |
|---|---|---|
| < 500ms | Ultra low latency | WebRTC |
| 1–4 seconds | Low latency | SRT, LL-HLS, CMAF |
| 4–10 seconds | Reduced latency | Short-segment HLS/DASH |
| 15–30 seconds | Standard | Traditional HLS/DASH |
| 30–60+ seconds | Broadcast | Satellite, linear TV |
Standard HLS — the dominant delivery protocol for the past decade — was designed for reliability and scale, not speed. If you’re new to the protocol, what is HLS streaming is a good primer. The IETF’s HLS specification (RFC 8216) defines the core format. It breaks video into segments that are typically 6–10 seconds long, which means a viewer’s player must buffer several of these before playback begins. Add CDN propagation, origin-to-edge delays, and encoder processing time, and you’re looking at 15–30 seconds of glass-to-glass latency.
That’s fine for Netflix. It’s a disaster for a live betting platform.
Ultra low latency streaming solves this by using different transport mechanisms, smaller encoding chunks, and smarter delivery architectures that prioritize real-time delivery over caching efficiency.
Why Latency Matters: Use Cases That Break Without It
Latency tolerance varies by application. A cooking tutorial can run at 30 seconds of delay with no user impact. But for the following use cases, latency directly determines product viability:
Live sports betting — Odds change in real time as match events unfold. If your stream runs 15 seconds behind the broadcast, your users are betting on outcomes that have already happened. Regulated platforms often mandate sub-3-second latency.
Interactive auctions — Bidders need to see competing bids and react in real time. Platforms with ultra low latency report significantly higher bid participation and fewer support tickets from timing confusion.
Online gaming and esports — Viewers watching competitive gaming expect the stream to align with in-game chat, overlays, and other participants. Sub-second alignment is the baseline.
Video conferencing and hybrid events — Any back-and-forth interaction — Q&As, call-ins, live polls — requires sub-500ms latency to feel natural.
Trading and financial data — Price feeds and market events require latency measured in milliseconds, not seconds.
Surveillance and security — Remote monitoring applications need near-real-time awareness. A 15-second delay in a security feed can mean a response team reacts to stale information.
If your application falls into any of these categories, your latency architecture is a product decision, not just a technical one. For a broader look at platform architecture, see the live video streaming platform guide.
The Latency Spectrum: From Traditional HLS to WebRTC
Understanding how latency accumulates end-to-end is the first step toward reducing it. Every stage in the pipeline adds delay:
- Encoder processing — The encoder captures raw video, applies compression, and outputs encoded frames. GOP (Group of Pictures) size directly affects this delay.
- Ingest transmission — Encoded video travels from the encoder to your origin/ingest server via RTMP, SRT, or another ingest protocol.
- Transcoding and packaging — The server transcodes into delivery formats and packages into segments or chunks.
- CDN distribution — Packaged video propagates from origin to edge nodes worldwide.
- Player buffering — The client player buffers several segments before beginning playback to protect against network jitter and buffering.
Traditional HLS optimizes steps 3–5 for cache efficiency and scale, which means larger segments and more aggressive buffering. Low-latency protocols attack each of these stages differently.
The Four Main Low-Latency Protocols
WebRTC — Sub-500ms
WebRTC (Web Real-Time Communication) is the protocol powering video conferencing — and it’s now being deployed for one-to-many broadcast streaming. It was designed from the ground up for real-time communication, using UDP-based transport (SRTP), STUN/TURN for NAT traversal, and peer-to-peer or SFU (Selective Forwarding Unit) topologies.
WebRTC latency: 100–500ms
Because WebRTC bypasses the segment-based delivery model entirely, it achieves dramatically lower latency than any HTTP-based protocol. The tradeoff is scalability — P2P architectures don’t scale to large audiences without an SFU/MCU infrastructure layer, and adaptive bitrate streaming is more complex to implement.
WebRTC shines for:
- Interactive use cases (auctions, gaming, video conferencing)
- Audiences under 10,000 concurrent viewers (with SFU)
- Real-time two-way communication scenarios
For a detailed comparison, see WebRTC vs HLS.
SRT — 1–2 Seconds
The SRT protocol (Secure Reliable Transport) is an open-source transport protocol developed by Haivision. It runs over UDP, uses ARQ (Automatic Repeat reQuest) for reliability, and adds AES encryption. SRT was designed for contribution — getting video from an encoder to an origin server reliably over unpredictable networks like the public internet.
SRT latency: 120ms–2 seconds (configurable via latency parameter)
SRT’s latency parameter is set to accommodate the RTT between sender and receiver, plus a buffer for packet loss recovery. For a 50ms RTT connection, you might configure 200ms of SRT latency. For a satellite uplink with 600ms RTT, you’d set higher.
SRT is primarily an ingest protocol — it gets video from the field to your origin, where it’s then repackaged for CDN delivery via HLS or LL-HLS. It doesn’t solve last-mile delivery latency on its own.
For a full protocol breakdown, see SRT vs RTMP — including how latency, packet loss recovery, and encoder compatibility compare.
LL-HLS — 2–5 Seconds
Low-Latency HLS (LL-HLS) is Apple’s extension to the HLS standard, introduced to close the gap between traditional HLS and WebRTC. Rather than waiting for a full 6-second segment to be complete before publishing it, LL-HLS publishes partial segments (called “parts”) as they are encoded — typically 200ms each.
LL-HLS latency: 2–5 seconds
The key LL-HLS additions to the protocol:
EXT-X-PART— Identifies partial segments in the media playlistEXT-X-PRELOAD-HINT— Tells players what partial segment to request next via HTTP/2 pushEXT-X-SERVER-CONTROL— Defines server-side delivery constraints_HLS_msnand_HLS_part— Query parameters for Blocking Playlist Reload (BPR), which lets clients long-poll the server for the next playlist update
LL-HLS inherits the HTTP/CDN delivery model of traditional HLS, which means it scales to millions of viewers using the same CDN infrastructure. The tradeoff compared to WebRTC is higher latency — but for broadcast-scale events where sub-500ms isn’t required, LL-HLS is often the right choice.
CMAF with Chunked Transfer Encoding — 2–4 Seconds
CMAF (Common Media Application Format) is not a delivery protocol — it’s a container format that works with both HLS and DASH. Its relevance to low latency comes from Chunked Transfer Encoding (CTE), an HTTP feature that allows servers to stream response data progressively.
CMAF + CTE latency: 2–4 seconds
With standard HLS, a CDN caches and serves complete segments. With CMAF + CTE, the CDN can begin forwarding chunks of a segment to the player while the rest of the segment is still being encoded. This approach:
- Works with existing HTTP/CDN infrastructure
- Supports both HLS and DASH players
- Reduces segment-level latency without requiring server-sent events or WebSocket connections
- Compatible with both LL-HLS and LL-DASH
The downside is that not all CDNs support chunked transfer caching correctly, and some edge cases arise with ABR switching when chunks from multiple renditions are in flight simultaneously.
Techniques for Reducing End-to-End Latency
Choosing the right protocol is necessary but not sufficient. Here are the key engineering techniques that move the needle on end-to-end latency:
1. Reduce GOP Size
A GOP (Group of Pictures) is the distance between I-frames (keyframes) in the encoded video stream. Players typically need to receive a complete GOP before they can begin decoding, because B-frames and P-frames depend on the keyframe for reference.
With a default GOP of 2 seconds, playback can’t begin until a full 2 seconds of video have been received and processed. Reducing your GOP to 1 second or 500ms cuts the minimum startup latency significantly.
The tradeoff: smaller GOPs increase bitrate requirements (more keyframes = more data) and reduce compression efficiency by roughly 10–20%. Choosing the best bitrate for streaming video becomes especially important when you tighten GOP sizes.
2. Use Smaller Segment Sizes
Traditional HLS uses 6–10 second segments. Reducing segment duration to 1–2 seconds cuts the time between a video event and when the segment containing it becomes available for download. LL-HLS extends this further by publishing partial segments of 200ms or less.
Smaller segments also reduce time-to-first-frame (TTFF), since players can begin playback after buffering fewer segments.
3. Implement Chunked Transfer Encoding at the CDN
As covered above, chunked transfer encoding allows CDN edge nodes to serve video data as it arrives from the origin — rather than waiting for a complete segment to cache. For this to work, your CDN must support:
- Progressive caching (forward-buffering/streamed responses)
Transfer-Encoding: chunkedresponse headers- LL-HLS or LL-DASH compliant origin
CDN partners like Akamai, Cloudflare, and Fastly all have LL-HLS compatible delivery. LiveAPI’s infrastructure uses these CDN partnerships to deliver low-latency CDN performance without per-customer CDN configuration.
4. Tune Player Buffer Settings
Player-side buffering is a significant source of latency. Most HLS.js and Shaka Player defaults are tuned for stability (3–8 seconds of buffer), not for latency. Low-latency deployments typically configure:
liveSyncDurationCount: 1–2 segments (instead of 3+)liveMaxLatencyDurationCount: 3–5 secondslowLatencyMode: truein HLS.js for LL-HLS support
Aggressive buffer settings increase the risk of rebuffering if network conditions degrade. The right balance depends on your audience’s network profile and your tolerance for rebuffer events vs. latency.
5. Use SRT or RTMP with Low-Latency Ingest
The first mile — from encoder to ingest server — also contributes to glass-to-glass latency. RTMP adds ~1–2 seconds of ingest delay. SRT is configurable but typically runs at 200ms–1 second depending on network RTT.
For applications requiring sub-2-second total latency, switching from RTMP to SRT for ingest can save 0.5–1 second off your total pipeline. Your RTMP server setup matters here — geographically distributed ingest points reduce the RTT between encoder and origin.
6. Optimize Transcoding Pipeline
Encoding takes time. For live streams, the transcoder must process each frame faster than real time. Delays in the transcoder — from CPU saturation, complex encoding profiles, or non-optimized codec settings — add directly to latency.
Key optimizations:
- Use hardware encoding (GPU-based H.264/HEVC) where possible
- Avoid B-frames (they require future-frame lookahead, adding latency)
- Set encoder preset to
ultrafastorsuperfastfor x264 (tradeoff: higher bitrate for same quality) - Reduce the number of ABR renditions if hardware is constrained
The RTMP to HLS conversion pipeline is where most transcoding latency accumulates, and it’s worth benchmarking your transcoder’s processing time separately from delivery latency.
From Theory to Practice: Implementing Low-Latency Streaming
Reducing latency from 30 seconds to under 3 seconds requires changes across your entire stack — encoder, ingest, transcoding, CDN, and player. Building this from scratch involves months of tuning, CDN negotiations, and infrastructure ops work.
This is where a managed live streaming API changes the calculus. LiveAPI handles the transcoding pipeline, CDN configuration, and HLS packaging layer — you define your stream parameters and the infrastructure handles latency optimization.
Here’s how to create a low-latency live stream with the LiveAPI SDK:
const sdk = require('api')('@liveapi/v1.0#5pfjhgkzh9rzt4');
// Create a new live stream
async function createLowLatencyStream() {
const response = await sdk.post('/livestreams', {
name: 'My Low-Latency Stream',
ingest_protocol: 'srt', // SRT for lower ingest latency than RTMP
latency_mode: 'low', // Enable low-latency delivery mode
hls_manifest_name: 'index',
recording: true // Auto-save to VOD after stream ends
});
const stream = response.data;
console.log('Stream created:', stream.id);
console.log('SRT ingest URL:', stream.ingest_endpoint);
console.log('Stream key:', stream.stream_key);
console.log('HLS playback URL:', stream.playback_url);
return stream;
}
createLowLatencyStream();
Once the stream is active, retrieve its status and playback URL:
async function getStreamStatus(streamId) {
const response = await sdk.get(`/livestreams/${streamId}`);
const stream = response.data;
console.log('Stream status:', stream.status); // active, idle, ended
console.log('Viewer count:', stream.viewer_count);
console.log('Playback URL:', stream.playback_url); // LL-HLS compatible URL
return stream;
}
For player-side low-latency configuration with HLS.js:
import Hls from 'hls.js';
const video = document.getElementById('video');
const playbackUrl = 'https://cdn.liveapi.com/streams/YOUR_STREAM_ID/index.m3u8';
if (Hls.isSupported()) {
const hls = new Hls({
lowLatencyMode: true, // Enable LL-HLS support
liveSyncDurationCount: 1, // Buffer 1 segment (vs default 3)
liveMaxLatencyDurationCount: 5,
backBufferLength: 30,
});
hls.loadSource(playbackUrl);
hls.attachMedia(video);
hls.on(Hls.Events.MANIFEST_PARSED, () => {
video.play();
});
}
LiveAPI’s infrastructure handles the rest: SRT ingest across globally distributed ingest points, low-latency HLS packaging, and CDN delivery via Akamai, Cloudflare, and Fastly. You get the latency benefits without managing the infrastructure layer.
If you’re evaluating the full SDK, the live streaming SDK guide covers setup, authentication, and the full stream lifecycle.
Choosing the Right Protocol for Your Use Case
Not every application needs WebRTC-level latency, and chasing sub-second delivery when 3–5 seconds would suffice adds unnecessary infrastructure complexity. Use this decision framework:
| Use Case | Audience Size | Required Latency | Recommended Protocol |
|---|---|---|---|
| Video conferencing | Small groups | < 200ms | WebRTC |
| Live auctions | 100–50K | < 500ms | WebRTC via SFU |
| Esports / gaming | 1K–500K | < 1s | WebRTC (SFU) or LL-HLS |
| Live sports betting | 10K–1M+ | < 3s | LL-HLS or CMAF |
| Concerts / events | 1K–1M+ | < 5s | LL-HLS |
| Town halls / Q&As | 100–100K | < 3s | LL-HLS |
| General news / content | Unlimited | < 30s | Standard HLS/DASH |
A few practical considerations:
WebRTC scales differently. Peer-to-peer WebRTC doesn’t scale beyond a handful of viewers. You need an SFU (like Janus, Mediasoup, or LiveKit) to fan out to large audiences. At scale, this is more expensive per-viewer than CDN delivery.
LL-HLS requires compatible players. Native Safari on Apple devices supports LL-HLS natively. On Android and desktop, you need HLS.js 1.0+ with lowLatencyMode: true. Older players fall back to standard HLS.
SRT is primarily an ingest protocol. Don’t confuse SRT’s low-latency contribution with last-mile delivery latency. SRT gets video to your origin; HLS or WebRTC delivers it to viewers.
Latency and quality trade off. Smaller GOPs and aggressive buffers increase bitrate requirements and rebuffer risk. Test at your target network conditions before shipping.
Ultra Low Latency Streaming FAQ
What is considered ultra low latency streaming?
Ultra low latency refers to end-to-end stream delay under one second, typically in the 200–500ms range. This is achievable with WebRTC-based delivery. “Low latency” is a broader category that includes delays from 1–5 seconds, achievable with SRT, LL-HLS, or CMAF.
Is WebRTC better than HLS for low-latency streaming?
It depends on your use case. WebRTC delivers sub-500ms latency but requires specialized infrastructure (SFU) to scale beyond small groups. LL-HLS delivers 2–5 second latency at CDN scale with simpler player integration. For interactive applications, WebRTC wins. For broadcast-scale events where 3–5 seconds is acceptable, LL-HLS is often more practical.
What causes latency in live streaming?
Latency accumulates at every stage: encoder processing (GOP size), ingest transmission (RTMP vs SRT), transcoding pipeline delays, CDN origin-to-edge propagation, and player-side buffer settings. Each stage typically adds 0.5–3 seconds, and they stack up to the total glass-to-glass delay.
How do I reduce buffering without increasing latency?
Use adaptive bitrate streaming to let the player dynamically select quality based on available bandwidth, reducing stall events without requiring a larger buffer. Also use a CDN with PoPs close to your viewer base — CDN performance directly affects both buffering and latency.
What is CMAF and how does it reduce latency?
CMAF (Common Media Application Format) is a container format compatible with both HLS and DASH. When combined with Chunked Transfer Encoding, the CDN can stream segment data to players as it arrives — rather than caching a complete segment first. This technique reduces latency by 2–4 seconds compared to standard segment-based delivery.
Can I use SRT to replace RTMP for live streaming?
Yes, and it often improves both reliability and latency. SRT was designed for unreliable networks — it uses forward error correction and ARQ retransmission to handle packet loss without the connection drops that affect RTMP over poor connections. Most broadcast encoders (OBS, vMix, FFmpeg) and dedicated SRT encoders support SRT output. If you’re already using an RTMP live stream setup, migrating the ingest leg to SRT is usually a low-risk, high-impact change.
What’s the minimum latency achievable with HLS?
With LL-HLS and partial segments of 200ms, you can theoretically achieve 2–3 seconds of end-to-end latency in optimal conditions. In practice, 3–5 seconds is the realistic floor for LL-HLS with CDN delivery. Sub-2-second latency over HTTP requires WebRTC.
Does lower latency mean lower video quality?
Not necessarily, but there is a tradeoff. Smaller GOPs reduce compression efficiency, increasing bitrate requirements for the same visual quality. Aggressive player buffers increase rebuffer risk under poor network conditions. Managed infrastructure can offset these tradeoffs through hardware encoding, optimized ABR profiles, and CDN architecture — but some efficiency loss is inherent.
Start Streaming with Sub-Second Latency
Ultra low latency video streaming is a protocol and infrastructure problem, not just a settings toggle. You need the right ingest protocol (SRT), the right delivery mechanism (LL-HLS or WebRTC), correct encoder settings (small GOPs, no B-frames), CDN partners with chunked transfer support, and player configuration tuned for latency rather than stability.
Building this stack from scratch takes months and requires ongoing tuning. LiveAPI handles the infrastructure layer — globally distributed SRT ingest, low-latency HLS packaging, multi-CDN delivery via Akamai, Cloudflare, and Fastly — so your team can focus on the application layer.
Get started with LiveAPI and ship your first low-latency live stream today.


