Live Streaming API

What Is Low Latency Streaming? How It Works, Protocols, and Use Cases

April 27, 2026 15 min read

Professional video cameras on tripods in a broadcast studio for low latency streaming

Reading Time: 11 minutes

Your app goes live. The host announces the winner. But half your viewers already saw it on Twitter.

That gap between what’s happening and what viewers see is latency — and it’s the difference between a live experience and a delayed replay. Low latency streaming is the practice of reducing that gap to the point where it stops affecting the experience.

This guide covers what low latency streaming is, what causes it, how it works, which protocols deliver it, and when your application actually needs it.

What Is Low Latency Streaming?

Low latency streaming is the delivery of live video with a glass-to-glass delay under 10 seconds — typically between 1 and 7 seconds — from the moment content is captured to when it appears on a viewer’s screen.

The term “glass-to-glass” describes the full pipeline: from the camera lens (first glass) to the viewer’s display (second glass). That entire journey — capture, encode, transmit, buffer, decode, render — is what video latency measures.

Low latency streaming is different from standard streaming, where typical HLS streaming delivers video with 15–30 seconds of delay. That delay is fine for recorded content or casual live broadcasts, but problematic for anything requiring real-time interaction.

Definition: Low latency streaming is a live video delivery method that reduces glass-to-glass delay to under 10 seconds through faster encoding, smaller segment sizes, and reduced player buffering.

Latency Tiers: Low, Ultra-Low, and Real-Time Compared

Not all “low latency” is the same. The streaming industry uses several tiers to describe different delay thresholds:

Latency Tier	Delay Range	Protocols	Best For
High latency	15–45 seconds	Standard HLS, DASH	Pre-recorded, VOD, casual live
Low latency	3–10 seconds	LL-HLS, LL-DASH	Live sports, auctions, events
Ultra-low latency	0.5–3 seconds	CMAF chunked, SRT	Sports betting, interactive events
Real-time / Sub-second	< 500ms	WebRTC, SRT	Video calls, gaming, auctions

The table above is a practical guide, but vendors use these terms loosely. “Low latency” at one company might mean 5 seconds; at another, 2. When evaluating streaming infrastructure, ask for specific glass-to-glass numbers rather than category labels.

Ultra-low latency streaming introduces additional technical tradeoffs — lower latency generally means less room to buffer during network drops, so you trade stability for speed.

What Causes Streaming Latency?

Every step in the live video pipeline adds delay. There’s no single source to fix — latency is cumulative.

Capture and Encoding

The encoder reads raw video from a camera, compresses it into a transmittable format, and packages it into segments. Modern hardware encoders are fast, but software encoding on general-purpose CPUs can add 500ms or more depending on settings. Your live streaming encoder choice directly affects this delay.

Ingest and Transmission

The encoded stream travels from your encoder to an ingest server over a network. Network distance, packet loss, and congestion all add delay here. Protocols like RTMP and SRT are designed for reliable ingest, but they add their own framing and handshake overhead.

Segmenting and Packaging

For HTTP-based delivery (HLS, DASH), video is cut into segments before distribution. Standard HLS uses 2–6 second segments, which means a minimum 2-segment buffer before playback starts — easily adding 6–18 seconds of delay before the viewer sees anything.

CDN Distribution

The packaged segments are distributed through a CDN for live streaming. Edge servers cache and serve content globally. The closer the CDN edge node is to the viewer, the less transmission delay. CDN caching also means the edge server might not have the very latest segment — adding another 1–2 seconds.

Player Buffering

The video player on the viewer’s device buffers several seconds of video to protect against network drops. Larger buffers mean smoother playback but higher latency. Smaller buffers reduce latency but make the stream more vulnerable to buffering interruptions.

How Low Latency Streaming Works

Low latency streaming reduces delay at multiple steps in the pipeline, not just one.

Smaller Segments

Standard HLS uses 2–6 second segments. Low-Latency HLS (LL-HLS) uses partial segments — as small as 200 milliseconds — and delivers them before the full segment is complete. This alone cuts several seconds from the pipeline.

Chunked Transfer Encoding

Rather than waiting for a complete segment file, low-latency protocols push chunks over persistent HTTP/2 connections using chunked transfer encoding. The player starts downloading and processing the current segment before it’s fully written — a technique called “server push” in LL-HLS.

Preload Hints

LL-HLS sends a “preload hint” in the playlist so the player knows to start downloading the next partial segment before the server even signals it’s available. This eliminates one full round-trip per segment request.

Reduced Player Buffer

Low-latency players run with smaller buffer targets — typically 1–3 segments instead of the standard 3–10. The player also adjusts playback speed slightly (0.85x–1.15x) to correct drift and maintain the low-latency target without stalling.

Protocol Selection

The specific protocol determines how much latency is achievable. WebRTC for live streaming bypasses HTTP entirely, using direct peer-to-peer or server-to-client connections for sub-second delivery. HTTP-based protocols like LL-HLS and LL-DASH scale to millions of viewers but top out around 3–7 seconds.

Low Latency Streaming Protocols

Different protocols reach different latency floors. Choosing the right one depends on your target latency, audience size, and infrastructure.

Low-Latency HLS (LL-HLS)

Apple introduced LL-HLS in 2019 as an extension to HLS. It uses partial segments, blocking playlist requests, and preload hints to bring HLS latency from 15–30 seconds down to 3–7 seconds. LL-HLS scales over standard CDNs and supports adaptive bitrate streaming, making it the most practical choice for most live broadcast applications.

Best for: Scaled live broadcasts where CDN delivery and broad device compatibility matter more than sub-second delivery.

Low-Latency DASH (LL-DASH)

The MPEG-DASH equivalent to LL-HLS. Uses chunked transfer encoding and chunked response to deliver segments as they’re being encoded. Achieves similar 3–7 second latency as LL-HLS. Less universally supported than LL-HLS across CDN and player ecosystems. See HLS vs DASH for a full protocol comparison.

Best for: Teams already invested in DASH workflows looking to reduce latency.

WebRTC

WebRTC is an open standard for real-time communication built into every major browser. It uses UDP-based transport, bypassing HTTP’s segment-based model entirely to achieve sub-second (200–500ms) latency.

WebRTC scales poorly beyond a few hundred concurrent viewers without a specialized media server (SFU or MCU). It also lacks native CDN support — you need dedicated WebRTC infrastructure to scale to millions of viewers. Compare WebRTC vs HLS and WebRTC vs RTMP for detailed protocol tradeoffs.

Best for: Video conferencing, interactive broadcasts, sub-second use cases with smaller audiences.

SRT (Secure Reliable Transport)

SRT is primarily an ingest protocol, not a delivery protocol. It runs over UDP with error correction to achieve low-latency, reliable transport between encoder and ingest server. Typical SRT ingest latency is 200–500ms. SRT doesn’t replace HLS or WebRTC at the delivery layer — it gets the stream from the encoder to the server faster. See SRT vs RTMP for when to use each for ingest.

Best for: Encoder-to-server ingest where network reliability is a concern.

RTMP

RTMP server connections remain the most widely supported ingest protocol despite being decades old. RTMP achieves 1–3 seconds of ingest latency but uses TCP, which introduces retransmission delays on unstable networks.

Best for: Maximum encoder compatibility; best when network conditions are stable.

Protocol Comparison

Protocol	Latency Range	Scale	CDN Compatible	Best Use
LL-HLS	3–7 seconds	Unlimited	Yes	Live sports, news, events
LL-DASH	3–7 seconds	Unlimited	Partial	DASH-first workflows
WebRTC	200–500ms	~1,000 viewers*	No	Video calls, interactive
SRT	200–500ms (ingest)	Ingest only	N/A	Encoder-to-server ingest
RTMP	1–3 seconds (ingest)	Ingest only	N/A	Universal encoder support

*Without specialized SFU infrastructure

Advantages of Low Latency Streaming

Viewers Can Participate in Real Time

When the stream matches what’s happening live, chat, polls, Q&As, and audience reactions are meaningful. A 30-second delay turns live interaction into a confusing mess — viewers respond to something that already happened half a minute ago. Sub-5-second latency keeps the social experience coherent.

Spoiler Risk Goes Down

With standard HLS latency (15–30 seconds), social media moves faster than the stream. Viewers watching a sports event on a platform with high latency will see spoilers in their feed before the action plays out on screen. Lower latency closes that gap significantly.

In-Play Sports Betting Becomes Viable

Live in-play betting requires that the viewer and the data provider see the same moment within a narrow window. At 30 seconds of latency, betting markets close before the streamed action plays out. At 3–5 seconds, in-play betting windows stay open long enough to be commercially useful.

Live Commerce and Auctions Work Properly

Live auctions and shoppable stream experiences — where bidders compete in real time — require that all participants see the same state within a few seconds. High latency creates unfair advantages for viewers on faster streams and makes product countdown timers meaningless.

Interactive Events Feel Natural

Webinars, virtual conferences, live coaching, and remote production workflows all involve some back-and-forth between host and audience. Even 10–15 seconds of delay makes real-time Q&A feel disjointed. At 3–5 seconds, it’s noticeable but manageable. Sub-second makes it feel natural.

Operational Monitoring Gets Tighter

When you’re broadcasting a live event, your operations team watches the same stream the audience sees. High latency means a problem — audio dropout, encoding glitch — that appears on screen happened 30 seconds ago in reality, limiting your ability to respond. Low latency lets your team react to what viewers actually see right now.

Challenges of Low Latency Streaming

CDN Caching Conflicts

Standard CDNs are designed to cache content aggressively — the opposite of what low latency requires. Low-latency delivery needs edge servers to serve the newest segments immediately without caching them for long. This increases origin-to-edge load and may require CDN configuration changes or specialized low-latency CDN products.

Rebuffering Risk Increases

The buffer is your safety net. With a 10-second buffer, your player can absorb a 10-second network hiccup without interruption. With a 1-second buffer, any network drop beyond 1 second causes a visible stall. Low latency and rebuffering resistance are in direct tension — you have to decide how aggressively to trade one for the other.

Infrastructure Cost Goes Up

Delivering very small segments frequently puts more load on origin servers and CDN edges. Partial segment delivery also means more HTTP requests per viewer per minute. At scale, this can meaningfully increase delivery costs compared to standard HLS.

Player Compatibility Varies

LL-HLS requires recent versions of player libraries and may not work on older browsers or Smart TV apps. LL-DASH support varies even more across platforms. Testing across all your target platforms before committing to a low-latency architecture is something teams regularly underestimate.

The tradeoffs above don’t make low latency streaming impractical — they mean it requires choosing the right architecture for your specific requirements. For most live broadcasting use cases, LL-HLS at 3–7 seconds hits a good balance between latency, scale, and reliability.

Use Cases for Low Latency Streaming

Low latency streaming applies to many types of applications. The right latency target depends on how interactive the experience needs to be.

Live Sports Broadcasting

Professional sports streaming needs latency under 10 seconds to stay close to traditional broadcast TV (5–7 seconds). Sub-5-second latency is the target for platforms competing with cable, particularly when in-play betting integrations are involved. Sports streaming services typically use LL-HLS to achieve 3–5 seconds at scale.

Online Sports Betting and Gambling

This is the most latency-sensitive use case outside of video calls. Betting platforms that stream the event alongside the market need the video and data feeds synced within 1–2 seconds. Horse racing is more demanding — under 500ms — because race lengths are short and betting windows close fast.

Interactive Live Events and Webinars

Virtual conferences, live training sessions, and webinars benefit from 3–7 second latency. It’s enough for the host to respond to audience questions within a natural conversational window. Real-time voting, polling, and Q&A features all work better when the host and audience share roughly the same view of the stream.

Live Shopping and Live Auctions

E-commerce live streams where hosts sell products in real time require latency low enough that product availability and pricing are accurate when viewers see a call to action. Live auctions need even tighter sync so bidders see the current price before placing bids.

Remote Production (REMI)

In remote production workflows, a director in one city controls a live broadcast from cameras in another city. Any latency between the camera feed and the director’s monitor makes real-time production cuts and cues difficult. Low latency streaming keeps the control room and the venue in sync.

Esports and Gaming Broadcasts

Competitive gaming events have audiences that know the game well and react in real time to player decisions. High latency makes live commentary feel off, spoils match outcomes, and breaks synchronization between in-game event data overlays and the video stream.

When building any of these applications with a live streaming API, choosing a platform that natively supports LL-HLS or low-latency protocols at the infrastructure level saves significant engineering work compared to building it yourself.

How to Reduce Streaming Latency in Your App

If you’re building a streaming application and need lower latency, here’s where to focus.

1. Choose the Right Protocol

Start with your target latency and work backward:

Sub-second: Use WebRTC
1–3 seconds: WebRTC with SFU infrastructure, or SRT ingest + proprietary last-mile delivery
3–7 seconds: Use LL-HLS or LL-DASH
7–15 seconds: Standard HLS with short segments

LL-HLS at 3–7 seconds covers most live broadcasting use cases without requiring specialized non-CDN infrastructure. See CMAF for how CMAF-based delivery relates to low-latency streaming at the container level.

2. Configure Your Encoder for Low Latency

Your encoder settings directly control how much latency the pipeline starts with:

Set segment duration to 0.5–1 second for LL-HLS
Use a low-latency encoding profile (tune=zerolatency in FFmpeg)
Reduce keyframe interval to 1–2 seconds to allow faster seeks
Use hardware encoding when possible to reduce encode time

ffmpeg -i input \
  -c:v libx264 -tune zerolatency -preset veryfast \
  -g 30 -keyint_min 30 \
  -hls_time 1 -hls_flags delete_segments \
  output.m3u8

3. Use a CDN with Low-Latency Support

Standard CDNs cache aggressively, which conflicts with low-latency delivery. Look for CDN partnerships that support LL-HLS natively — Akamai, Cloudflare, and Fastly all have low-latency delivery configurations that serve partial segments without cache conflicts.

If you’re building on a live streaming SDK or API platform, verify whether their CDN infrastructure is configured for low latency or defaults to standard caching behavior.

4. Reduce Player Buffer Targets

Configure your player to use a smaller buffer when targeting low latency:

// HLS.js low-latency configuration
const hls = new Hls({
  lowLatencyMode: true,
  liveSyncDuration: 2,       // Target 2s behind live edge
  liveMaxLatencyDuration: 5, // Max 5s before catchup
  backBufferLength: 10,
});

HLS.js has a built-in lowLatencyMode that enables partial segment loading, bitrate selection tuned for low latency, and automatic catchup speed adjustment when a viewer drifts behind the live edge.

5. Place Ingest Servers Close to Your Source

If your encoder is in New York and your ingest server is in Singapore, you’ve already added 150ms+ to your pipeline before encoding is done. Use ingest endpoints geographically close to your broadcast source, then rely on your CDN to handle global distribution to viewers.

LiveAPI runs global ingest infrastructure with CDN distribution through Akamai, Cloudflare, and Fastly — so the origin-to-edge path is pre-configured for low-latency delivery rather than requiring manual CDN tuning on your end.

6. Monitor Live Edge Latency

Build latency monitoring into your streaming dashboard. Track the gap between the server’s live edge timestamp and viewer playback position. If average latency starts drifting above your target, you want to know before viewers do.

Low Latency Streaming FAQ

What is low latency streaming?
Low latency streaming is the delivery of live video with a glass-to-glass delay under 10 seconds — typically 1–7 seconds — from when content is captured to when viewers see it. The goal is to keep the streaming experience close enough to real time for viewers to interact with live events meaningfully.

What is considered low latency for live video?
The industry defines low latency as a glass-to-glass delay under 10 seconds. Ultra-low latency is under 3 seconds. Sub-second (real-time) is under 500ms. The right target depends on your use case — live sports may need 3–5 seconds, while video conferencing needs under 500ms.

What causes high latency in live streaming?
Latency builds up across every step in the pipeline: encoding time, segment size (HLS uses 2–6s segments by default), CDN propagation, and player buffering. Standard HLS at 15–30 seconds of latency is mostly caused by large segment sizes and conservative player buffer settings rather than network issues alone.

What is the difference between low latency and ultra-low latency streaming?
Low latency streaming typically refers to delays of 3–10 seconds, achieved with LL-HLS or LL-DASH. Ultra-low latency refers to delays under 3 seconds, usually requiring WebRTC, specialized CDN configuration, or proprietary last-mile delivery. Ultra-low latency trades some scale and reliability for speed.

Is WebRTC the best protocol for low latency streaming?
WebRTC achieves the lowest latency (200–500ms) of any widely supported streaming approach, but it doesn’t scale to large audiences without specialized infrastructure. For broadcasts to thousands or millions of viewers, LL-HLS delivers 3–7 seconds at scale with standard CDN support. WebRTC is the right choice when you need sub-second latency and your audience is small to medium.

How does Low-Latency HLS work?
LL-HLS reduces latency by delivering partial segments — as small as 200ms — before the full 1–2 second segment is complete. It uses preload hints to start downloading the next partial segment before the server signals it’s ready, and blocking playlist requests that allow the player to hang on a connection until the next segment arrives. Together, these cut standard HLS latency from 15–30 seconds down to 3–7 seconds.

Can you do low latency streaming with HLS?
Yes. Low-Latency HLS (LL-HLS) is an extension to the HLS specification that Apple introduced in 2019. It’s supported by major CDNs (Akamai, Cloudflare, Fastly) and player libraries (HLS.js, Video.js). It achieves 3–7 seconds of glass-to-glass latency while maintaining HLS compatibility and adaptive bitrate streaming.

What is glass-to-glass latency?
Glass-to-glass latency is the total delay from the camera lens (first glass) capturing content to the viewer’s display screen (second glass) rendering it. It’s the most complete measure of end-to-end streaming delay because it includes encoding, transmission, CDN distribution, and player buffering — not just network transit time.

What latency do sports betting platforms need?
In-play sports betting requires latency under 2 seconds to keep video and betting markets synchronized. Horse racing is more demanding — under 500ms — because race lengths are short and betting windows close fast. At 30 seconds of standard HLS latency, most in-play betting is impractical because market state changes faster than the stream.

How do I reduce latency in my live stream?
The most effective steps: switch to LL-HLS or WebRTC, reduce encoder segment duration to 0.5–1 second, configure your player buffer to target 2–3 seconds behind the live edge, use a CDN with native LL-HLS support, and place ingest servers close to your broadcast source. Each step addresses a different part of the pipeline.

Closing

Low latency streaming covers a wide range — from the 5-second difference between your stream and broadcast TV, to the sub-second precision needed for live auctions or competitive gaming. The right latency target for your application depends on how real-time your viewers need the experience to be.

If you’re building a streaming application and want low-latency delivery without the infrastructure complexity, Get started with LiveAPI — it handles LL-HLS delivery, CDN distribution through Akamai, Cloudflare, and Fastly, and RTMP/SRT ingest in one API, so you can focus on your product rather than tuning a streaming pipeline.