Live Streaming API

What Is Video Latency? Causes, Types, and How to Reduce It

15 min read
Video latency diagram showing streaming pipeline delay from camera to screen
Reading Time: 11 minutes

Video latency is the time between when a video is captured and when a viewer sees it on screen. For a football match, it’s the seconds between a goal being scored and the celebration reaching your audience. For a live auction, it’s the gap between a price changing and a bidder acting on outdated information.

For developers building live streaming applications, video latency determines whether your product feels real-time or falls flat. A viewer engaging with chat who receives responses to questions that were answered 20 seconds ago, or a betting platform where odds on screen no longer reflect the actual game state — these aren’t edge cases. They’re product failures.

This guide covers what video latency is, how it’s measured across your streaming pipeline, what causes it at each stage, and how to bring it down based on your specific use case.


What Is Video Latency?

Video latency is the total time delay between when a camera captures a video frame and when a viewer’s screen displays that same frame. This end-to-end delay is commonly called glass-to-glass latency — referring to the journey from the camera lens (one pane of “glass”) to the viewer’s display (the other).

Glass-to-glass latency spans every stage in the streaming pipeline:

  1. Camera capture — The camera sensor records a frame
  2. Encoding — Raw video is compressed into a streamable format (H.264, H.265, AV1)
  3. Ingest — The encoded stream is sent from the encoder to an origin server
  4. Processing — The server segments and packages the stream
  5. CDN delivery — Packaged content travels to edge servers worldwide
  6. Network transit — Data crosses the internet to the viewer’s network
  7. Decoding — The player decompresses the video
  8. Rendering — The frame is displayed on screen

Each stage adds time. Glass-to-glass latency is the sum of all of them. Understanding which stages contribute most gives you a clear target for where to focus your efforts.


How Video Latency Is Measured

The most direct way to measure video latency is the clock comparison method: display a running clock or timestamp on the source device, then photograph both the source screen and the playback screen at the same moment. The difference between the two displayed times is your glass-to-glass latency.

Wall-clock time is the reference point — the actual real-world time when content was captured. Comparing wall-clock time at capture versus wall-clock time at playback gives you a reliable latency measurement without specialized hardware.

For production monitoring, you can log ingest timestamps via API and compare them to the player’s reported playback position. This gives per-session latency data you can track over time and alert on when it drifts.

A practical shortcut during development: if you’re delivering via adaptive bitrate streaming, most player SDKs expose a latency or liveDelay property you can read programmatically and log alongside other session metrics.


Types of Video Latency

Video latency exists on a spectrum. The right target depends on your application — not every use case requires sub-second delivery.

Latency Type Delay Range Protocol Examples Typical Use Cases
Real-time < 300ms WebRTC, WHIP/WHEP Video calls, remote control
Ultra-low latency 300ms–1s WebRTC SFU, SRT Interactive streaming, gaming
Low latency 1–6s LL-HLS, LL-DASH, SRT Sports betting, live shopping
Standard latency 6–30s HLS, MPEG-DASH Broadcast TV, live events
High latency 30s+ HTTP progressive VOD, archived content

Real-Time Latency (< 300ms)

Above 300–400ms, two-way conversations become uncomfortable — speakers overlap because they can’t gauge real-time feedback from the other side. WebRTC is the standard protocol for sub-300ms delivery, transmitting media over UDP with minimal buffering. The tradeoff is scalability: WebRTC is designed for small-group sessions and requires a Selective Forwarding Unit (SFU) to reach large audiences without a massive server footprint.

Ultra-Low Latency (300ms–1s)

Achievable with WebRTC SFU architectures or SRT ingest paired with aggressive player buffer tuning. Suitable for live events where viewers need to act on real-time information — live betting markets, real-time social interactions, or interactive gaming streams where a one-second lag breaks the experience.

Low Latency (1–6s)

The sweet spot for most live streaming applications. LL-HLS (Low-Latency HLS) and LL-DASH achieve 2–5 seconds in real-world deployments. This range supports large audience scales, works with standard CDN infrastructure, and covers a wide range of devices without requiring special player builds.

Standard Latency (6–30s)

Traditional HLS streaming operates here by default, with 6–10 second segments and 2–3 segment buffers. Sufficient for broadcast news and live events where viewer interaction isn’t the focus — the stream is live, but the exact timing gap doesn’t matter to most viewers.

High Latency (30s+)

HTTP progressive download or large-segment HLS. Appropriate for VOD and pre-recorded content where real-time delivery is irrelevant.


What Causes Video Latency?

Video latency builds up at every stage of your pipeline. Here are the five main sources.

1. Encoding and Compression

Before a stream travels anywhere, the encoder compresses raw video frames into a format that can be transmitted over the internet. This takes time.

Two common encoding approaches:

  • Frame-based encoding: The encoder waits for a complete video frame before compressing it. Better compression efficiency, but adds 100–200ms of encoding delay.
  • Slice-based encoding: The encoder compresses portions of a frame as they arrive, cutting encoding latency to 10–30ms at the cost of slightly lower compression efficiency.

Your choice of video encoder and codec settings directly controls this stage. Hardware encoders (GPU-accelerated) typically process frames faster than software encoders and support low-latency encoding modes. Disabling B-frames (bidirectional prediction frames) and reducing keyframe intervals are the two most impactful codec settings for cutting encoding latency.

2. Streaming Protocol and Segment Size

The streaming protocol is one of the biggest variables in end-to-end latency. Traditional HLS segments are 6–10 seconds long. A player buffers 2–3 segments before starting playback, setting a baseline latency of 12–30 seconds before network transit is even counted.

A useful rule of thumb: latency ≈ 3× segment duration. Shorter segments reduce latency in proportion.

SRT protocol operates at the transport layer with 150ms–3s latency depending on network conditions. WebRTC live streaming bypasses the HTTP segment model entirely, sending media over UDP for sub-500ms delivery.

LL-HLS and LL-DASH break segments into partial chunks (200–500ms) and use HTTP/2 push to deliver content before a full segment is complete — achieving 2–5 seconds while staying compatible with standard CDN infrastructure.

3. Network Transmission and Distance

Once encoded and ingested, the stream crosses the internet to reach viewers. Physical distance adds measurable delay — data travels at roughly two-thirds the speed of light through fiber, meaning a stream traveling from New York to Tokyo adds at least 80–100ms of one-way transit time before any processing overhead.

Network congestion, packet loss, and retransmission add further unpredictable delays. A single retransmitted packet on a TCP-based protocol like RTMP can stall the stream for 100–500ms while the missing packet is re-requested and delivered.

4. CDN and Edge Delivery

CDNs reduce latency by serving content from edge servers geographically closer to viewers. But CDN architecture matters for live streaming. Traditional CDNs optimize for throughput and cache hit rates, which can introduce buffering when content arrives in segments rather than complete files.

For low-latency streaming, you need a CDN for video streaming that supports chunked transfer encoding and HTTP/2 push to deliver partial segments in real time. Without this, the CDN itself adds seconds of delay even when your ingest pipeline is fast. Akamai, Cloudflare, and Fastly all support LL-HLS delivery — if you’re building on a streaming API like LiveAPI, this CDN routing is handled for you across all three providers based on viewer geography.

Without any CDN, your origin server handles every viewer request directly, adding latency as traffic grows and the server becomes the bottleneck.

5. Player Buffering

Even after data reaches the viewer’s device, the player holds content in a buffer before rendering it. This protects against jitter: if the network pauses for 2 seconds, a 3-second buffer means the viewer sees uninterrupted playback rather than a freeze.

The tradeoff is direct: larger buffers mean higher latency. Buffering when streaming is a quality-of-experience problem, and the instinct to fix frequent pauses by increasing the buffer size makes your latency worse.

Low-latency players use 1–2 segment buffers and implement adaptive buffering — increasing buffer depth only when network conditions degrade, then reducing it again as conditions improve.


Video Latency by Streaming Protocol

Different streaming protocols make different trade-offs between latency, scalability, and device compatibility. Here’s how the major options compare:

Protocol Typical Latency Scalability Device Support Best For
WebRTC 100–500ms Low–Medium (SFU required) Browsers, native apps Video calls, interactive streams
SRT 150ms–3s Medium Servers, encoders Contribution, low-latency ingest
RTMP 2–5s Medium Encoders, servers Ingest to streaming servers
LL-HLS 2–5s High All HLS-compatible devices Large-scale live streaming
LL-DASH 2–5s High Most browsers, devices Large-scale live streaming
HLS (standard) 10–30s Very high All devices Broadcast, VOD, live events

For most developer use cases — live streaming to web and mobile audiences at scale — the recommended path is SRT or RTMP for ingest into your streaming server, with LL-HLS or standard HLS for viewer delivery. This balances low latency with broad device compatibility and standard CDN support.

If you need sub-second interactivity, WebRTC is the right choice — but requires a WebRTC server or SFU to scale beyond small group sessions. If you’re ingesting from an RTMP encoder, your streaming server handles the protocol conversion to HLS for viewer delivery.


When Video Latency Actually Matters

Latency requirements vary significantly by use case. Not every application needs to hit the low end of the spectrum.

Live Sports and Sports Betting

High latency creates spoilers — social media posts about a goal reach viewers before the stream catches up. For betting platforms, a 10-second gap between different viewers creates arbitrage opportunities that undermine fairness. Target: 3–6 seconds.

Video Conferencing and Remote Collaboration

Above 300–400ms, conversations feel unnatural — people talk over each other because they can’t read real-time cues. Video conferencing platforms target under 200ms for interactions that feel natural. Target: < 300ms.

Live Shopping and Interactive Commerce

Live shopping platforms need real-time synchronization between presenter actions and viewer responses. Viewers acting on product information that’s 15 seconds old leads to inventory errors and frustrated purchases. Target: 3–6 seconds.

Broadcast and Linear TV Events

Traditional broadcast has 5–7 seconds of inherent production latency. Online streaming that matches or beats this feels live to most viewers. Target: 6–15 seconds.

Surveillance and Remote Operations

Security cameras, traffic monitoring, and industrial remote control all require low latency for accurate situational awareness. A 2-second lag on a security feed can mean missing an event entirely. Target: < 500ms to 2 seconds.

E-Learning and Webinars

Presenter-to-audience delivery is acceptable at 3–10 seconds since the interaction is mostly one-directional. Live Q&A benefits from lower latency, but it’s rarely the priority for educational content. Target: 3–10 seconds.


Building low-latency streaming requires decisions across every layer of your stack. Now that you understand where latency originates, the practical question is what you can actually do to reduce it.


How to Reduce Video Latency

Here are the most effective techniques for reducing video latency across a live streaming pipeline.

1. Choose the Right Protocol First

Protocol selection is your highest-impact decision. If your application needs sub-second delivery, standard HLS won’t get you there regardless of other tuning. Switch to WebRTC or SRT. If you need 2–5 seconds at scale, move from standard HLS to LL-HLS. Review the HLS vs DASH trade-offs to pick the right low-latency delivery format for your stack before configuring anything else.

2. Reduce Segment Duration

For HLS-based delivery, shorter segments directly reduce latency. Moving from 6-second segments to 2-second segments cuts latency roughly in proportion while increasing HTTP request volume. LL-HLS takes this further with partial segments of 200–500ms, cutting latency under 3 seconds without abandoning standard HTTP-based infrastructure.

3. Configure Your Encoder for Low Latency

Encoder settings have a major impact on how much delay you introduce before the stream even hits the network. Key settings to adjust:

  • Disable B-frames: B-frames use future frames as references, requiring the encoder to buffer ahead. Disabling them removes this source of delay at a modest quality cost.
  • Reduce keyframe interval: Use 1–2 seconds instead of the default 10. This increases random-access points in the stream, which players need to tune in and start playback faster.
  • Switch to slice-based encoding: Reduces encoding latency from 100–200ms to 10–30ms if your encoder supports it.
  • Use hardware acceleration: A dedicated SRT encoder with GPU encoding can bring encoding delay under 100ms for most resolutions.

4. Use a CDN That Supports LL-HLS

Not all CDNs handle LL-HLS correctly. You need a CDN that supports chunked transfer encoding and HTTP/2 push to deliver partial segments before they’re complete. Confirm that your origin server outputs LL-HLS manifests correctly and that your CDN is configured to pass through chunked responses rather than waiting for a full segment before caching.

5. Deploy Infrastructure Close to Your Audience

Reduce the physical distance between your streaming infrastructure and your viewers. For contribution, choose an ingest server or cloud region geographically near your broadcaster. For distribution, use a CDN with global edge locations or deploy regional origin servers near your highest-traffic geographies. Every 1,000 km of additional network path adds roughly 5–10ms of one-way transit time — small individually, but significant when compounding with other delays.

6. Tune Player Buffer Settings

Work with your player configuration to reduce the target buffer length. Standard players buffer 3–5 seconds; for low-latency streams, configure a 1–2 second target. Enable catch-up mode — a feature most modern players support that gradually increases playback speed (1.05–1.1×) when latency drifts above target, closing the gap without perceptible audio pitch changes.

7. Use Wired Connections for Critical Broadcast Sources

For the broadcaster side, a wired ethernet connection removes 10–200ms of Wi-Fi jitter from the encoding-to-ingest leg. This is especially relevant for high-stakes live events where a brief Wi-Fi dropout could cause visible stuttering. For viewer-side connectivity, CDN edge servers that terminate connections close to viewers reduce the last-mile variability that causes jitter and rebuffering.


Low-Latency Streaming with LiveAPI

For teams building low-latency live streaming into their applications, the infrastructure decisions above represent significant configuration work if handled from scratch: ingest servers, CDN configuration, protocol conversion, encoder compatibility, and failover logic.

LiveAPI’s live streaming API handles this pipeline — from RTMP and SRT ingest to multi-CDN delivery via Akamai, Cloudflare, and Fastly — with the streaming infrastructure managed for you.

Relevant capabilities for latency-sensitive applications:

  • SRT and RTMP ingest: SRT’s built-in error correction handles packet loss without the head-of-line blocking that affects RTMP over congested networks. Both protocols are fully supported.
  • Multi-CDN delivery: Traffic routes through Akamai, Cloudflare, or Fastly based on viewer geography, reducing the physical distance component of latency.
  • HLS output: HLS URLs are generated automatically from your ingest stream, compatible with all major playback devices across web and mobile.
  • Global server redundancy: Regional infrastructure reduces the server-to-viewer distance that a single-origin setup adds.
  • Live-to-VOD: Streams are recorded automatically, so tuning your live product for low latency doesn’t affect your VOD archive quality.

This means your team focuses on the application layer — your player UI, viewer experience, and business logic — rather than managing ingest servers and CDN routing rules.


Video Latency FAQ

What is a good latency for live streaming?

It depends on your use case. For broadcast-style events where viewers aren’t interacting in real time, 6–15 seconds is workable. For sports, live shopping, or audience engagement features, target 3–6 seconds with LL-HLS. For two-way interactive experiences, you need sub-500ms latency via WebRTC.

What is glass-to-glass latency?

Glass-to-glass latency is the total delay from when a camera captures a frame to when that frame appears on a viewer’s screen. It covers every stage: capture, encoding, ingest, CDN delivery, network transit, decoding, and rendering. It’s the most complete single measure of end-to-end streaming delay, unlike metrics that only measure part of the pipeline.

What’s the difference between latency and buffering?

Latency is the fixed delay from capture to display. Buffering is the temporary pause in playback that happens when the player runs out of data before the stream delivers the next segment. Buffering is a symptom of network instability or a player configured with too small a buffer for current conditions — it doesn’t directly indicate high latency, though the two are related.

Does WebRTC always have lower latency than HLS?

WebRTC delivers 100–500ms vs. LL-HLS’s 2–5 seconds, so yes in raw numbers — but the comparison involves real trade-offs. WebRTC vs HLS comes down to scalability and infrastructure: WebRTC requires a server-side SFU to scale beyond small groups, adding meaningful complexity. LL-HLS scales to large audiences using standard CDN infrastructure that most teams already know.

How does SRT reduce latency compared to RTMP?

The key difference is the underlying transport. RTMP uses TCP, which retransmits every lost packet and can stall the stream for hundreds of milliseconds while waiting for the retransmit to arrive. SRT uses UDP with selective retransmission — it only retransmits packets that have time to arrive before their playback deadline, dropping those that would arrive too late. This makes SRT more predictable under packet loss and better suited for contribution over congested or lossy links.

Does the video codec affect latency?

Both H.264 and H.265 can be configured for low-latency encoding. H.264 vs H.265 is primarily a quality-per-bitrate trade-off, not a latency trade-off. H.265 achieves better compression, but hardware encoder support for H.265 low-latency mode is less consistent than H.264. For live streaming where encoding latency matters most, H.264 is typically the safer choice.

How do you measure video latency without specialized equipment?

Display a running clock or timer as an overlay on your source — either through your encoding software or a clock app on the camera device. View the stream on a second device. Take a photo of both screens at the same moment. The difference in displayed times is your glass-to-glass latency. For automated monitoring, most player SDKs expose a liveDelay or latency property you can log programmatically alongside other session metrics.

Why does latency increase over time during a long stream?

Latency drift happens when the player’s buffer grows gradually — the player receives data slightly faster than it consumes it, or brief network jitter events cause it to buffer ahead to recover. Good low-latency players implement catch-up mode, speeding playback to 1.05–1.1× to close the gap when latency drifts above target. Without this, latency grows steadily during long broadcasts until the viewer refreshes the page.


Wrapping Up

Video latency accumulates at every stage: encoding, protocol selection, network transit, CDN delivery, and player buffering. There’s no single fix, but there’s a clear priority order — start with protocol selection, then encoder settings, then CDN configuration, then player tuning.

For most live streaming applications, LL-HLS gives you 2–5 seconds of video latency at full CDN scale. WebRTC gives you sub-500ms when your use case genuinely requires it. SRT covers the ingest side for both. Understanding where latency comes from in your specific pipeline is the first step toward hitting your target — and knowing which stage to fix first.

Get started with LiveAPI to build low-latency live streaming into your application without building the underlying pipeline from scratch.

Join 200,000+ satisfied streamers

Still on the fence? Take a sneak peek and see what you can do with Castr.

No Castr Branding

No Castr Branding

We do not include our branding on your videos.

No Commitment

No Commitment

No contracts. Cancel or change your plans anytime.

24/7 Support

24/7 Support

Highly skilled in-house engineers ready to help.

  • Check Free 7-day trial
  • CheckCancel anytime
  • CheckNo credit card required