Captions

VTT vs SRT: Which Subtitle Format Should You Use?

May 19, 2026 17 min read

Video editing software interface showing a multi-track timeline for adding subtitles and captions

Reading Time: 12 minutes

Roughly 92% of viewers in the US watch videos with the sound off at least some of the time, which means your subtitle file is doing a lot more work than you might think. If you’re shipping a video feature, sooner or later you have to answer one question: do you serve subtitles as a .srt file or a .vtt file?

Both formats hold the same basic thing — timestamped lines of text — but they behave differently in browsers, in mobile players, and on streaming platforms. The wrong choice means broken captions on iOS Safari, missing styling on your HTML5 player, or rejected uploads on a social network. This guide breaks down VTT vs SRT in plain terms: what each format is, how they differ at the file level, when to use one over the other, and how to convert between them.

By the end, you’ll know which subtitle format fits your video player, your streaming protocol, and your platform mix — and how to wire it up without burning a week on edge cases.

What Are VTT and SRT Files?

Before comparing the two, it helps to define each one on its own terms.

SRT (SubRip Subtitle) is a plain-text subtitle file format created in the late 1990s for the SubRip ripping tool. It stores subtitles as numbered cues with start and end timestamps and one or more lines of text. SRT is the most widely supported subtitle format in the world: YouTube, Vimeo, Facebook, LinkedIn, VLC, every major NLE, and almost every set-top box accept it.

VTT (Web Video Text Tracks, often written WebVTT) is a W3C standard subtitle format designed for HTML5 video. It uses a similar cue-based structure but adds a required WEBVTT header, supports CSS styling, positioning, cue settings, chapters, metadata, and integrates directly with the HTML5 element. The format is published by the W3C WebVTT specification and is the native subtitle format for the modern web.

In short: SRT is the universal lowest common denominator. VTT is the format built for the browser, with more features and stricter syntax.

VTT vs SRT: Side-by-Side Comparison

Here’s the quick comparison developers usually need before going deeper into either format.

Feature	SRT (.srt)	VTT (.vtt)
Full name	SubRip Subtitle	Web Video Text Tracks (WebVTT)
Year introduced	~2000	2010 (W3C draft)
MIME type	application/x-subrip	text/vtt
Header required	No	Yes (`WEBVTT`)
Timestamp separator	Comma (`00:01:23,456`)	Period (`00:01:23.456`)
Cue numbering	Required	Optional
Styling support	None (plain text only)	CSS, bold, italic, underline, color, positioning
Positioning	No	Yes (line, position, align, vertical)
Metadata	No	Yes (NOTE blocks, chapter markers)
HTML5 native	No (requires conversion)	Yes
YouTube upload	Yes	Yes
Vimeo upload	Yes	Yes
Apple HLS / iOS native	No	Yes (segmented WebVTT in HLS)
MPEG-DASH native	No	Yes (TTML or WebVTT)
File size	Slightly smaller	Slightly larger (header + optional cue IDs)
Best for	Universal distribution, NLEs, social, set-top boxes	HTML5 players, HLS, DASH, adaptive streaming

Both formats use UTF-8 encoding (or should — more on that later), and both can be opened in any text editor. The big functional split is HTML5 streaming: if your video pipeline ends in a browser or an Apple device, VTT is the format the platform actually expects.

Key Differences Between VTT and SRT

The comparison table above gives the headline. This section unpacks the differences that bite developers in production.

File Structure and Header

SRT files have no header. The file starts directly with the first cue:

1
00:00:01,000 --> 00:00:04,000
Hello and thanks for joining us.

2
00:00:04,500 --> 00:00:07,000
We're going live in three seconds.

A VTT file always starts with the WEBVTT line. Cue identifiers are optional, and an empty line separates each cue:

WEBVTT

00:00:01.000 --> 00:00:04.000
Hello and thanks for joining us.

00:00:04.500 --> 00:00:07.000
We're going live in three seconds.

If you serve a VTT file without the WEBVTT header, the browser will silently reject it and your element will fail to load captions. This is the single most common VTT bug developers hit on day one.

Timestamp Format

SRT uses a comma as the decimal separator for milliseconds:

00:01:23,456 --> 00:01:27,890

VTT uses a period:

00:01:23.456 --> 00:01:27.890

VTT also lets you drop the leading hour for cues under 60 minutes:

01:23.456 --> 01:27.890

SRT requires the full HH:MM:SS,mmm format for every cue. Mixing the two formats in the same file produces parse errors in most players.

Styling and Positioning

SRT is plain text. A few media players support inline HTML tags like and because they were tolerated by old SubRip players, but the spec does not officially define them, and behavior varies wildly across platforms.

VTT supports a real styling layer:

Inline tags: , , , , , ,

Cue settings: line, position, size, align, vertical

CSS styling via the ::cue pseudo-element in your stylesheet

A styled VTT cue can look like this:

WEBVTT NOTE Speaker A intro intro 00:00:01.000 --> 00:00:04.000 line:90% position:50% align:center <v Alice>Hello and thanks for joining us.</v>

If your product needs speaker labels, color-coded speakers, or captions positioned away from a lower-thirds graphic, VTT is the format. SRT will not get you there.

Metadata and Chapters

VTT supports NOTE blocks for comments and a separate chapters track kind for video chapter navigation. The HTML5 element has a dedicated kind="chapters" attribute that reads cue text as chapter titles. SRT has no concept of chapters — it’s cues all the way down.

If your video player needs scrubbable chapter navigation driven by a sidecar file, VTT chapter tracks are the path of least resistance.

Encoding

Both formats are plain text, and both should be saved as UTF-8 without a BOM (byte-order mark). The VTT spec requires UTF-8. SRT is more forgiving in practice, but a Mandarin or Arabic SRT saved as Windows-1252 will render as garbage in any HTML5 player. If you’re generating subtitle files programmatically, default to UTF-8 for both formats.

Browser and Player Support

This is where the two formats actually diverge in production.

HTML5 element: Native VTT only. SRT must be converted to VTT before it works in the tag.

Apple HLS: Captions are delivered as segmented WebVTT files (referenced by the HLS manifest). SRT is not part of the HLS spec.

MPEG-DASH: WebVTT or TTML. SRT is not part of the DASH spec.

YouTube, Vimeo, Facebook, LinkedIn: Both formats accepted. Internally they normalize to their own format.

VLC, MPC-HC, Plex, mpv: Both formats supported, plus a dozen others.

iOS native player, AVPlayer: WebVTT via HLS. Sideloaded SRT is not supported.

So if your videos play in a browser, on Apple devices, or through an adaptive bitrate streaming manifest, the answer is almost always VTT.

How VTT (WebVTT) Works

A VTT file is a UTF-8 text file with the text/vtt MIME type. Its job is to describe timed text cues that a video player overlays on the video timeline.

The structure has four parts:

Header — the literal string WEBVTT, optionally followed by a description on the same line.

Optional metadata — NOTE blocks, STYLE blocks for CSS, and REGION blocks for cue layout.

Cues — one or more timed text cues separated by blank lines.

Cue payload — the actual text, optionally with inline tags and timestamps for karaoke-style timing.

Here is a more complete example:

WEBVTT - Live stream captions STYLE ::cue(.speaker) { color: #f0c419; font-weight: bold; } NOTE This file is auto-generated from the LiveAPI captions API. REGION id:speakers width:40% lines:3 regionanchor:0%,100% viewportanchor:10%,90% intro 00:00:01.000 --> 00:00:04.000 region:speakers align:start <c.speaker>Alice:</c> Hello and thanks for joining us. 01 00:00:04.500 --> 00:00:07.000 We're going live in three seconds.

To hook this file into a video tag, point a element at it:

<video controls> <source src="stream.m3u8" type="application/vnd.apple.mpegurl" /> <track kind="subtitles" src="captions-en.vtt" srclang="en" label="English" default /> </video>

The browser parses the VTT file, renders the cues at the timestamps it specifies, and applies any styling from STYLE blocks or your own CSS. The full element behavior is documented in the HTML5 track specification.

How SRT (SubRip) Works

SRT has no formal specification — it was a de facto standard that grew out of the SubRip Windows application around 2000. Every SRT cue has four parts:

Sequence number — a 1-based integer.

Timestamp line — HH:MM:SS,mmm --> HH:MM:SS,mmm.

Subtitle text — one or more lines of plain text.

Blank line — separates each cue.

A minimal SRT file looks like this:

1 00:00:01,000 --> 00:00:04,000 Hello and thanks for joining us. 2 00:00:04,500 --> 00:00:07,000 We're going live in three seconds. 3 00:00:07,500 --> 00:00:11,000 Audio is muted by default — click the volume icon to unmute.

To attach an SRT file to a video, most desktop players look for a sidecar file with the same name as the video and the .srt extension. To use SRT in an HTML5 player, you have to convert it to VTT first — the element does not parse SRT.

For video on demand platforms that re-encode and re-package content, SRT is often the input format because editors and transcription vendors deliver it that way. The platform then converts to VTT or burns the captions into the video before serving them to viewers.

When to Use VTT

Pick VTT when any of these are true:

Your player is HTML5. The tag’s element parses VTT natively and nothing else.

You’re streaming over HLS. Apple HLS expects segmented WebVTT, referenced from the master playlist. Read more about HLS streaming and how captions fit in.

You’re streaming over MPEG-DASH. DASH supports WebVTT and TTML, not SRT.

You need styling, speaker tags, or positioning. VTT is the only format with a real CSS hook.

You need chapter navigation. kind="chapters" on a VTT file gives you scrubbable chapters with no extra code.

Your audience is on iOS or tvOS. iOS only recognizes WebVTT subtitles delivered through HLS.

You care about accessibility audits. VTT cleanly separates kind="subtitles", kind="captions", kind="descriptions", and kind="metadata", which screen readers and accessibility tooling rely on.

In practical terms, anyone building a video streaming app, an OTT platform, a video conferencing API, or a learning platform that runs in the browser is shipping VTT.

When to Use SRT

Pick SRT when any of these are true:

You’re uploading to social platforms. YouTube, Vimeo, Facebook, and LinkedIn all accept SRT and treat it as the safe default.

Your distribution is offline or desktop. Set-top boxes, smart TVs, Plex, Kodi, VLC — all accept SRT without fuss.

You’re handing off to a video editing workflow. Premiere Pro, DaVinci Resolve, Final Cut, and Avid Media Composer import SRT directly.

You’re working with transcription vendors. Almost every human transcription service delivers SRT by default.

You don’t need styling. Plain text captions with no positioning needs are simpler to produce, store, and audit as SRT.

You want maximum compatibility with one file. If you can only ship one subtitle file and you don’t control the playback environment, SRT plays in more places than any other format.

A common pattern: store master subtitles as SRT in your CMS, and convert to VTT on the fly when serving them through a video player API or HTML5 video element.

How to Convert Between VTT and SRT

The two formats are close enough that conversion is mostly mechanical, but a few details trip people up.

Converting SRT to VTT

To turn an SRT file into a valid VTT file:

Add the line WEBVTT followed by a blank line at the top of the file.

Replace the comma in every timestamp with a period (00:01:23,456 → 00:01:23.456).

The sequence numbers can stay or be removed — VTT cue identifiers are optional.

Save as UTF-8 without a BOM.

A minimal bash one-liner with sed:

{ echo "WEBVTT"; echo; sed 's/,\([0-9]\{3\}\) --> /\.\1 --> /; s/ --> \([0-9]*:[0-9]*:[0-9]*\),\([0-9]\{3\}\)/ --> \1.\2/' input.srt; } > output.vtt

A more reliable approach is FFmpeg:

ffmpeg -i input.srt output.vtt

FFmpeg handles encoding conversion, line-ending normalization, and edge cases like cues without trailing blank lines.

Converting VTT to SRT

To turn a VTT file into a valid SRT file:

Strip the WEBVTT header and any NOTE, STYLE, or REGION blocks.

Replace periods in timestamps with commas.

Remove all cue settings (everything after the timestamp on the same line, like line:90% align:center).

Strip all inline tags (, , etc.) — SRT will not understand them.

Add 1-based sequence numbers if you stripped them or if there were none.

Save as UTF-8.

Again, FFmpeg does it in one command:

ffmpeg -i input.vtt output.srt

Keep in mind that conversion from VTT to SRT is lossy: styling, positioning, chapters, regions, and speaker tags are all dropped. If you have a rich VTT file and you only need SRT for social uploads, keep the VTT as the source of truth and treat the SRT as a downstream artifact.

How Subtitles Fit into a Streaming Pipeline

Captions and subtitles don’t live in isolation. They ride alongside the video file and the manifest, and how they’re delivered depends on the streaming protocol.

For progressive download (a single MP4 served from a CDN), captions are typically a sidecar VTT file referenced from the HTML5 element. The browser fetches the VTT separately and aligns it to the video timeline.

For HLS adaptive streaming, captions are usually segmented WebVTT files referenced in the master playlist as a #EXT-X-MEDIA:TYPE=SUBTITLES entry. The player downloads segments of the caption file alongside video segments. Apple devices only recognize this delivery path. If you’re new to HLS, the HLS vs DASH comparison covers how captions are handled in each protocol.

For MPEG-DASH, captions are either inline TTML in the manifest or sidecar WebVTT segments. SRT is not part of the DASH spec at all.

For WebRTC live streaming, real-time captions are usually sent over a data channel (not as a file) and rendered by the receiving client. Read more about WebRTC live streaming for the architecture details.

When you’re building live video streaming features, the cleanest setup is: accept SRT from your transcription vendor, convert to segmented WebVTT during transcoding, and let the manifest reference the WebVTT segments. Your players just point at the manifest and the right captions show up.

Adding Captions to Live Streams with LiveAPI

For developers who don’t want to assemble the transcoder, the manifest generator, the caption packager, and the player from parts, LiveAPI’s live streaming API handles the moving pieces.

You ingest video over RTMP or SRT, LiveAPI transcodes to adaptive bitrate HLS, your captions are packaged as WebVTT alongside the video segments, and the embeddable player renders them with no extra configuration. If your team is shipping a live video streaming platform, captions stop being a side project — they’re part of the same API call that delivers the stream.

The same flow works for VOD: upload an MP4 with a sidecar SRT, and LiveAPI’s video API handles conversion to VTT, packaging into HLS, and CDN delivery across Akamai, Cloudflare, and Fastly. If you’re trying to launch a streaming feature without building video transcoding and caption packaging from scratch, that’s where the time savings come from.

Best Practices for Subtitle Files

A few field-tested rules that apply to both formats:

Always save as UTF-8 without BOM. A BOM at the top of a VTT file breaks parsing in Safari. Same for SRT in some HLS packagers.

Check files before shipping. Run them through a parser — vttvalidator, pycaption, or webvtt-py — instead of trusting that they look right.

Keep cues short. Aim for 32-42 characters per line and no more than two lines per cue. Long cues stay on screen too long and force viewers to read instead of watch.

Pad your timestamps. Use full HH:MM:SS.mmm even when the cue is in the first minute. It avoids ambiguity with some older parsers.

Avoid trailing whitespace and stray Windows line endings. Both formats expect LF newlines and clean cue separators.

Version your subtitle files. Treat them as code — diffable, reviewable, and stored next to the video they describe.

Don’t burn captions into the video. Burned-in (open) captions can’t be toggled off, translated, or restyled. Keep them as sidecar files unless you have a specific reason not to.

If you’re new to the broader topic of timed text, the closed captioning vs subtitles breakdown covers the accessibility side of the same problem.

Is VTT or SRT Right for Your Project?

Run through this quick checklist to settle the format question:

Will the video play in a browser or on iOS? → VTT.

Are you delivering over HLS or DASH? → VTT.

Do you need styling, positioning, speaker labels, or chapters? → VTT.

Are you uploading to social platforms or handing off to NLEs? → SRT works everywhere.

Do you only need basic text captions and broad compatibility? → SRT.

Are you building a streaming platform from scratch? → Store master as SRT, deliver as VTT.

Most production pipelines end up using both: SRT for ingestion and editorial, VTT for delivery to end users.

VTT vs SRT FAQ

Is VTT the same as SRT?

No. Both are timed-text subtitle formats, but VTT (WebVTT) is a W3C standard with a required WEBVTT header, period-based timestamps, optional cue IDs, styling, positioning, and chapter support. SRT is a simpler de-facto standard with comma-based timestamps and plain text only. SRT files won’t parse as VTT and vice versa without conversion.

Which is better, VTT or SRT?

Neither is universally better — they solve different problems. VTT is the right pick for HTML5 video, HLS, DASH, iOS, and any caption that needs styling or positioning. SRT is the right pick for social uploads, NLE editing workflows, set-top boxes, and maximum compatibility across legacy players.

Can I use SRT in an HTML5 video tag?

No, not directly. The HTML5 element only parses WebVTT. If you need to use SRT content in a browser, convert it to VTT first — the conversion is trivial and can be done at build time, on the server, or in the browser with a small JavaScript shim.

How do I convert SRT to VTT?

Add the WEBVTT header at the top with a blank line after it, replace every comma in the timestamps with a period, and save as UTF-8. FFmpeg does it in one command: ffmpeg -i input.srt output.vtt. There are also free web converters and command-line tools like pycaption and webvtt-py.

Do YouTube and Vimeo support both formats?

Yes. YouTube accepts both SRT and VTT (along with SBV, TTML, SAMI, and several others). Vimeo accepts both. Both platforms normalize uploads to their internal caption format, so the choice between SRT and VTT on upload rarely matters for the final viewer experience.

Why does VTT use a period and SRT use a comma in timestamps?

It’s a regional convention from the formats’ origins. SRT was created in Europe where the comma is the decimal separator. VTT inherited the W3C convention of using the period as the decimal separator for interoperability with CSS and JavaScript, which use the period universally. There’s no functional reason — it’s just history.

Can SRT files include styling like bold or italic?

The official SRT format does not define styling. Some players (notably VLC and a few NLEs) tolerate inline , , and tags, but the behavior is inconsistent across platforms. If you need reliable styling, use VTT.

Are VTT files larger than SRT files?

Slightly. A VTT file has the WEBVTT header and may include NOTE, STYLE, and REGION blocks. For typical 30-minute captions, the size difference is usually under 1% — not a meaningful factor in delivery or storage.

Can I have one subtitle file work for both web and social platforms?

In practice, ship two: SRT for social uploads and NLE handoffs, VTT for HTML5 playback. Generate them from the same source (transcript, EDL, or AI captioning output) so they stay in sync. A converter step in your build pipeline keeps them aligned.

Does WebVTT work with HLS streaming?

Yes — WebVTT is the native subtitle format for HLS. Captions are segmented into short WebVTT files referenced from the HLS master playlist as a #EXT-X-MEDIA:TYPE=SUBTITLES track. Apple’s HLS implementation only recognizes WebVTT for captions, which is why VTT is the default for any iOS or tvOS playback.

Wrapping Up

VTT vs SRT comes down to where your video plays. If the answer involves a browser, an iPhone, or an HLS or DASH manifest, you’re shipping WebVTT. If the answer involves YouTube, Vimeo, a video editor, or a desktop media player, SRT is the path of least resistance. Most real pipelines use both — SRT in editorial and ingestion, VTT in delivery — and convert between them at build time.

The format itself is rarely the hard part. The hard part is packaging captions into a manifest, segmenting them for adaptive bitrate playback, and serving them through a CDN that doesn’t break on UTF-8 BOMs. That’s the layer where shipping a captioning feature gets expensive.

Ready to ship captioned video without rebuilding the pipeline? Get started with LiveAPI and add WebVTT captions to live streams and VOD with the same API call that delivers your video.

VTT vs SRT: Which Subtitle Format Should You Use?

What Are VTT and SRT Files?

VTT vs SRT: Side-by-Side Comparison

Key Differences Between VTT and SRT

File Structure and Header

Timestamp Format

Styling and Positioning

Metadata and Chapters

Encoding

Browser and Player Support

How VTT (WebVTT) Works

How SRT (SubRip) Works

When to Use VTT

When to Use SRT

How to Convert Between VTT and SRT

Converting SRT to VTT

Converting VTT to SRT

How Subtitles Fit into a Streaming Pipeline

Adding Captions to Live Streams with LiveAPI

Best Practices for Subtitle Files

Is VTT or SRT Right for Your Project?

VTT vs SRT FAQ

Is VTT the same as SRT?

Which is better, VTT or SRT?

Can I use SRT in an HTML5 video tag?

How do I convert SRT to VTT?

Do YouTube and Vimeo support both formats?

Why does VTT use a period and SRT use a comma in timestamps?

Can SRT files include styling like bold or italic?

Are VTT files larger than SRT files?

Can I have one subtitle file work for both web and social platforms?

Does WebVTT work with HLS streaming?

Wrapping Up

Table of content

PROFESSIONAL LIVE STREAMING TOOL FOR EVERYONE

SHARE THIS ARTICLE

Recent Articles

What Is NAT Traversal? How It Works, Techniques, and Use Cases

AV1 Encoding: How It Works, Encoders, and How to Use It

What Is Shoppable Video? How It Works, Types, and How to Build One

Join 200,000+ satisfied streamers

No Castr Branding

No Commitment

24/7 Support

Related Articles

Closed Captioning vs Subtitles: What Developers Need to Know