Roughly 92% of viewers in the US watch videos with the sound off at least some of the time, which means your subtitle file is doing a lot more work than you might think. If you’re shipping a video feature, sooner or later you have to answer one question: do you serve subtitles as a .srt file or a .vtt file?
Both formats hold the same basic thing — timestamped lines of text — but they behave differently in browsers, in mobile players, and on streaming platforms. The wrong choice means broken captions on iOS Safari, missing styling on your HTML5 player, or rejected uploads on a social network. This guide breaks down VTT vs SRT in plain terms: what each format is, how they differ at the file level, when to use one over the other, and how to convert between them.
By the end, you’ll know which subtitle format fits your video player, your streaming protocol, and your platform mix — and how to wire it up without burning a week on edge cases.
What Are VTT and SRT Files?
Before comparing the two, it helps to define each one on its own terms.
SRT (SubRip Subtitle) is a plain-text subtitle file format created in the late 1990s for the SubRip ripping tool. It stores subtitles as numbered cues with start and end timestamps and one or more lines of text. SRT is the most widely supported subtitle format in the world: YouTube, Vimeo, Facebook, LinkedIn, VLC, every major NLE, and almost every set-top box accept it.
VTT (Web Video Text Tracks, often written WebVTT) is a W3C standard subtitle format designed for HTML5 video. It uses a similar cue-based structure but adds a required WEBVTT header, supports CSS styling, positioning, cue settings, chapters, metadata, and integrates directly with the HTML5 element. The format is published by the W3C WebVTT specification and is the native subtitle format for the modern web.
In short: SRT is the universal lowest common denominator. VTT is the format built for the browser, with more features and stricter syntax.
VTT vs SRT: Side-by-Side Comparison
Here’s the quick comparison developers usually need before going deeper into either format.
| Feature | SRT (.srt) | VTT (.vtt) |
|---|---|---|
| Full name | SubRip Subtitle | Web Video Text Tracks (WebVTT) |
| Year introduced | ~2000 | 2010 (W3C draft) |
| MIME type | application/x-subrip | text/vtt |
| Header required | No | Yes (WEBVTT) |
| Timestamp separator | Comma (00:01:23,456) |
Period (00:01:23.456) |
| Cue numbering | Required | Optional |
| Styling support | None (plain text only) | CSS, bold, italic, underline, color, positioning |
| Positioning | No | Yes (line, position, align, vertical) |
| Metadata | No | Yes (NOTE blocks, chapter markers) |
HTML5 native |
No (requires conversion) | Yes |
| YouTube upload | Yes | Yes |
| Vimeo upload | Yes | Yes |
| Apple HLS / iOS native | No | Yes (segmented WebVTT in HLS) |
| MPEG-DASH native | No | Yes (TTML or WebVTT) |
| File size | Slightly smaller | Slightly larger (header + optional cue IDs) |
| Best for | Universal distribution, NLEs, social, set-top boxes | HTML5 players, HLS, DASH, adaptive streaming |
Both formats use UTF-8 encoding (or should — more on that later), and both can be opened in any text editor. The big functional split is HTML5 streaming: if your video pipeline ends in a browser or an Apple device, VTT is the format the platform actually expects.
Key Differences Between VTT and SRT
The comparison table above gives the headline. This section unpacks the differences that bite developers in production.
File Structure and Header
SRT files have no header. The file starts directly with the first cue:
1
00:00:01,000 --> 00:00:04,000
Hello and thanks for joining us.
2
00:00:04,500 --> 00:00:07,000
We're going live in three seconds.
A VTT file always starts with the WEBVTT line. Cue identifiers are optional, and an empty line separates each cue:
WEBVTT
00:00:01.000 --> 00:00:04.000
Hello and thanks for joining us.
00:00:04.500 --> 00:00:07.000
We're going live in three seconds.
If you serve a VTT file without the WEBVTT header, the browser will silently reject it and your element will fail to load captions. This is the single most common VTT bug developers hit on day one.
Timestamp Format
SRT uses a comma as the decimal separator for milliseconds:
00:01:23,456 --> 00:01:27,890
VTT uses a period:
00:01:23.456 --> 00:01:27.890
VTT also lets you drop the leading hour for cues under 60 minutes:
01:23.456 --> 01:27.890
SRT requires the full HH:MM:SS,mmm format for every cue. Mixing the two formats in the same file produces parse errors in most players.
Styling and Positioning
SRT is plain text. A few media players support inline HTML tags like and because they were tolerated by old SubRip players, but the spec does not officially define them, and behavior varies wildly across platforms.
VTT supports a real styling layer:
- Inline tags:
,,,,,, - Cue settings:
line,position,size,align,vertical - CSS styling via the
::cuepseudo-element in your stylesheet
A styled VTT cue can look like this:
WEBVTT
NOTE Speaker A intro
intro
00:00:01.000 --> 00:00:04.000 line:90% position:50% align:center
<v Alice>Hello and thanks for joining us.</v>
If your product needs speaker labels, color-coded speakers, or captions positioned away from a lower-thirds graphic, VTT is the format. SRT will not get you there.
Metadata and Chapters
VTT supports NOTE blocks for comments and a separate chapters track kind for video chapter navigation. The HTML5 element has a dedicated kind="chapters" attribute that reads cue text as chapter titles. SRT has no concept of chapters — it’s cues all the way down.
If your video player needs scrubbable chapter navigation driven by a sidecar file, VTT chapter tracks are the path of least resistance.
Encoding
Both formats are plain text, and both should be saved as UTF-8 without a BOM (byte-order mark). The VTT spec requires UTF-8. SRT is more forgiving in practice, but a Mandarin or Arabic SRT saved as Windows-1252 will render as garbage in any HTML5 player. If you’re generating subtitle files programmatically, default to UTF-8 for both formats.
Browser and Player Support
This is where the two formats actually diverge in production.
- HTML5
element: Native VTT only. SRT must be converted to VTT before it works in thetag. - Apple HLS: Captions are delivered as segmented WebVTT files (referenced by the HLS manifest). SRT is not part of the HLS spec.
- MPEG-DASH: WebVTT or TTML. SRT is not part of the DASH spec.
- YouTube, Vimeo, Facebook, LinkedIn: Both formats accepted. Internally they normalize to their own format.
- VLC, MPC-HC, Plex, mpv: Both formats supported, plus a dozen others.
- iOS native player, AVPlayer: WebVTT via HLS. Sideloaded SRT is not supported.
So if your videos play in a browser, on Apple devices, or through an adaptive bitrate streaming manifest, the answer is almost always VTT.
How VTT (WebVTT) Works
A VTT file is a UTF-8 text file with the text/vtt MIME type. Its job is to describe timed text cues that a video player overlays on the video timeline.
The structure has four parts:
- Header — the literal string
WEBVTT, optionally followed by a description on the same line. - Optional metadata —
NOTEblocks,STYLEblocks for CSS, andREGIONblocks for cue layout. - Cues — one or more timed text cues separated by blank lines.
- Cue payload — the actual text, optionally with inline tags and timestamps for karaoke-style timing.
Here is a more complete example:
WEBVTT - Live stream captions
STYLE
::cue(.speaker) {
color: #f0c419;
font-weight: bold;
}
NOTE This file is auto-generated from the LiveAPI captions API.
REGION
id:speakers
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
intro
00:00:01.000 --> 00:00:04.000 region:speakers align:start
<c.speaker>Alice:</c> Hello and thanks for joining us.
01
00:00:04.500 --> 00:00:07.000
We're going live in three seconds.
To hook this file into a video tag, point a element at it:
<video controls>
<source src="stream.m3u8" type="application/vnd.apple.mpegurl" />
<track
kind="subtitles"
src="captions-en.vtt"
srclang="en"
label="English"
default
/>
</video>
The browser parses the VTT file, renders the cues at the timestamps it specifies, and applies any styling from STYLE blocks or your own CSS. The full element behavior is documented in the HTML5 track specification.
How SRT (SubRip) Works
SRT has no formal specification — it was a de facto standard that grew out of the SubRip Windows application around 2000. Every SRT cue has four parts:
- Sequence number — a 1-based integer.
- Timestamp line —
HH:MM:SS,mmm --> HH:MM:SS,mmm. - Subtitle text — one or more lines of plain text.
- Blank line — separates each cue.
A minimal SRT file looks like this:
1
00:00:01,000 --> 00:00:04,000
Hello and thanks for joining us.
2
00:00:04,500 --> 00:00:07,000
We're going live in three seconds.
3
00:00:07,500 --> 00:00:11,000
Audio is muted by default — click the volume icon to unmute.
To attach an SRT file to a video, most desktop players look for a sidecar file with the same name as the video and the .srt extension. To use SRT in an HTML5 player, you have to convert it to VTT first — the element does not parse SRT.
For video on demand platforms that re-encode and re-package content, SRT is often the input format because editors and transcription vendors deliver it that way. The platform then converts to VTT or burns the captions into the video before serving them to viewers.
When to Use VTT
Pick VTT when any of these are true:
- Your player is HTML5. The
tag’selement parses VTT natively and nothing else. - You’re streaming over HLS. Apple HLS expects segmented WebVTT, referenced from the master playlist. Read more about HLS streaming and how captions fit in.
- You’re streaming over MPEG-DASH. DASH supports WebVTT and TTML, not SRT.
- You need styling, speaker tags, or positioning. VTT is the only format with a real CSS hook.
- You need chapter navigation.
kind="chapters"on a VTT file gives you scrubbable chapters with no extra code. - Your audience is on iOS or tvOS. iOS only recognizes WebVTT subtitles delivered through HLS.
- You care about accessibility audits. VTT cleanly separates
kind="subtitles",kind="captions",kind="descriptions", andkind="metadata", which screen readers and accessibility tooling rely on.
In practical terms, anyone building a video streaming app, an OTT platform, a video conferencing API, or a learning platform that runs in the browser is shipping VTT.
When to Use SRT
Pick SRT when any of these are true:
- You’re uploading to social platforms. YouTube, Vimeo, Facebook, and LinkedIn all accept SRT and treat it as the safe default.
- Your distribution is offline or desktop. Set-top boxes, smart TVs, Plex, Kodi, VLC — all accept SRT without fuss.
- You’re handing off to a video editing workflow. Premiere Pro, DaVinci Resolve, Final Cut, and Avid Media Composer import SRT directly.
- You’re working with transcription vendors. Almost every human transcription service delivers SRT by default.
- You don’t need styling. Plain text captions with no positioning needs are simpler to produce, store, and audit as SRT.
- You want maximum compatibility with one file. If you can only ship one subtitle file and you don’t control the playback environment, SRT plays in more places than any other format.
A common pattern: store master subtitles as SRT in your CMS, and convert to VTT on the fly when serving them through a video player API or HTML5 video element.
How to Convert Between VTT and SRT
The two formats are close enough that conversion is mostly mechanical, but a few details trip people up.
Converting SRT to VTT
To turn an SRT file into a valid VTT file:
- Add the line
WEBVTTfollowed by a blank line at the top of the file. - Replace the comma in every timestamp with a period (
00:01:23,456→00:01:23.456). - The sequence numbers can stay or be removed — VTT cue identifiers are optional.
- Save as UTF-8 without a BOM.
A minimal bash one-liner with sed:
{ echo "WEBVTT"; echo; sed 's/,\([0-9]\{3\}\) --> /\.\1 --> /; s/ --> \([0-9]*:[0-9]*:[0-9]*\),\([0-9]\{3\}\)/ --> \1.\2/' input.srt; } > output.vtt
A more reliable approach is FFmpeg:
ffmpeg -i input.srt output.vtt
FFmpeg handles encoding conversion, line-ending normalization, and edge cases like cues without trailing blank lines.
Converting VTT to SRT
To turn a VTT file into a valid SRT file:
- Strip the
WEBVTTheader and anyNOTE,STYLE, orREGIONblocks. - Replace periods in timestamps with commas.
- Remove all cue settings (everything after the timestamp on the same line, like
line:90% align:center). - Strip all inline tags (
,, etc.) — SRT will not understand them. - Add 1-based sequence numbers if you stripped them or if there were none.
- Save as UTF-8.
Again, FFmpeg does it in one command:
ffmpeg -i input.vtt output.srt
Keep in mind that conversion from VTT to SRT is lossy: styling, positioning, chapters, regions, and speaker tags are all dropped. If you have a rich VTT file and you only need SRT for social uploads, keep the VTT as the source of truth and treat the SRT as a downstream artifact.
How Subtitles Fit into a Streaming Pipeline
Captions and subtitles don’t live in isolation. They ride alongside the video file and the manifest, and how they’re delivered depends on the streaming protocol.
For progressive download (a single MP4 served from a CDN), captions are typically a sidecar VTT file referenced from the HTML5 element. The browser fetches the VTT separately and aligns it to the video timeline.
For HLS adaptive streaming, captions are usually segmented WebVTT files referenced in the master playlist as a #EXT-X-MEDIA:TYPE=SUBTITLES entry. The player downloads segments of the caption file alongside video segments. Apple devices only recognize this delivery path. If you’re new to HLS, the HLS vs DASH comparison covers how captions are handled in each protocol.
For MPEG-DASH, captions are either inline TTML in the manifest or sidecar WebVTT segments. SRT is not part of the DASH spec at all.
For WebRTC live streaming, real-time captions are usually sent over a data channel (not as a file) and rendered by the receiving client. Read more about WebRTC live streaming for the architecture details.
When you’re building live video streaming features, the cleanest setup is: accept SRT from your transcription vendor, convert to segmented WebVTT during transcoding, and let the manifest reference the WebVTT segments. Your players just point at the manifest and the right captions show up.
Adding Captions to Live Streams with LiveAPI
For developers who don’t want to assemble the transcoder, the manifest generator, the caption packager, and the player from parts, LiveAPI’s live streaming API handles the moving pieces.
You ingest video over RTMP or SRT, LiveAPI transcodes to adaptive bitrate HLS, your captions are packaged as WebVTT alongside the video segments, and the embeddable player renders them with no extra configuration. If your team is shipping a live video streaming platform, captions stop being a side project — they’re part of the same API call that delivers the stream.
The same flow works for VOD: upload an MP4 with a sidecar SRT, and LiveAPI’s video API handles conversion to VTT, packaging into HLS, and CDN delivery across Akamai, Cloudflare, and Fastly. If you’re trying to launch a streaming feature without building video transcoding and caption packaging from scratch, that’s where the time savings come from.
Best Practices for Subtitle Files
A few field-tested rules that apply to both formats:
- Always save as UTF-8 without BOM. A BOM at the top of a VTT file breaks
parsing in Safari. Same for SRT in some HLS packagers. - Check files before shipping. Run them through a parser —
vttvalidator,pycaption, orwebvtt-py— instead of trusting that they look right. - Keep cues short. Aim for 32-42 characters per line and no more than two lines per cue. Long cues stay on screen too long and force viewers to read instead of watch.
- Pad your timestamps. Use full
HH:MM:SS.mmmeven when the cue is in the first minute. It avoids ambiguity with some older parsers. - Avoid trailing whitespace and stray Windows line endings. Both formats expect LF newlines and clean cue separators.
- Version your subtitle files. Treat them as code — diffable, reviewable, and stored next to the video they describe.
- Don’t burn captions into the video. Burned-in (open) captions can’t be toggled off, translated, or restyled. Keep them as sidecar files unless you have a specific reason not to.
If you’re new to the broader topic of timed text, the closed captioning vs subtitles breakdown covers the accessibility side of the same problem.
Is VTT or SRT Right for Your Project?
Run through this quick checklist to settle the format question:
- Will the video play in a browser or on iOS? → VTT.
- Are you delivering over HLS or DASH? → VTT.
- Do you need styling, positioning, speaker labels, or chapters? → VTT.
- Are you uploading to social platforms or handing off to NLEs? → SRT works everywhere.
- Do you only need basic text captions and broad compatibility? → SRT.
- Are you building a streaming platform from scratch? → Store master as SRT, deliver as VTT.
Most production pipelines end up using both: SRT for ingestion and editorial, VTT for delivery to end users.
VTT vs SRT FAQ
Is VTT the same as SRT?
No. Both are timed-text subtitle formats, but VTT (WebVTT) is a W3C standard with a required WEBVTT header, period-based timestamps, optional cue IDs, styling, positioning, and chapter support. SRT is a simpler de-facto standard with comma-based timestamps and plain text only. SRT files won’t parse as VTT and vice versa without conversion.
Which is better, VTT or SRT?
Neither is universally better — they solve different problems. VTT is the right pick for HTML5 video, HLS, DASH, iOS, and any caption that needs styling or positioning. SRT is the right pick for social uploads, NLE editing workflows, set-top boxes, and maximum compatibility across legacy players.
Can I use SRT in an HTML5 video tag?
No, not directly. The HTML5 element only parses WebVTT. If you need to use SRT content in a browser, convert it to VTT first — the conversion is trivial and can be done at build time, on the server, or in the browser with a small JavaScript shim.
How do I convert SRT to VTT?
Add the WEBVTT header at the top with a blank line after it, replace every comma in the timestamps with a period, and save as UTF-8. FFmpeg does it in one command: ffmpeg -i input.srt output.vtt. There are also free web converters and command-line tools like pycaption and webvtt-py.
Do YouTube and Vimeo support both formats?
Yes. YouTube accepts both SRT and VTT (along with SBV, TTML, SAMI, and several others). Vimeo accepts both. Both platforms normalize uploads to their internal caption format, so the choice between SRT and VTT on upload rarely matters for the final viewer experience.
Why does VTT use a period and SRT use a comma in timestamps?
It’s a regional convention from the formats’ origins. SRT was created in Europe where the comma is the decimal separator. VTT inherited the W3C convention of using the period as the decimal separator for interoperability with CSS and JavaScript, which use the period universally. There’s no functional reason — it’s just history.
Can SRT files include styling like bold or italic?
The official SRT format does not define styling. Some players (notably VLC and a few NLEs) tolerate inline , , and tags, but the behavior is inconsistent across platforms. If you need reliable styling, use VTT.
Are VTT files larger than SRT files?
Slightly. A VTT file has the WEBVTT header and may include NOTE, STYLE, and REGION blocks. For typical 30-minute captions, the size difference is usually under 1% — not a meaningful factor in delivery or storage.
Can I have one subtitle file work for both web and social platforms?
In practice, ship two: SRT for social uploads and NLE handoffs, VTT for HTML5 playback. Generate them from the same source (transcript, EDL, or AI captioning output) so they stay in sync. A converter step in your build pipeline keeps them aligned.
Does WebVTT work with HLS streaming?
Yes — WebVTT is the native subtitle format for HLS. Captions are segmented into short WebVTT files referenced from the HLS master playlist as a #EXT-X-MEDIA:TYPE=SUBTITLES track. Apple’s HLS implementation only recognizes WebVTT for captions, which is why VTT is the default for any iOS or tvOS playback.
Wrapping Up
VTT vs SRT comes down to where your video plays. If the answer involves a browser, an iPhone, or an HLS or DASH manifest, you’re shipping WebVTT. If the answer involves YouTube, Vimeo, a video editor, or a desktop media player, SRT is the path of least resistance. Most real pipelines use both — SRT in editorial and ingestion, VTT in delivery — and convert between them at build time.
The format itself is rarely the hard part. The hard part is packaging captions into a manifest, segmenting them for adaptive bitrate playback, and serving them through a CDN that doesn’t break on UTF-8 BOMs. That’s the layer where shipping a captioning feature gets expensive.
Ready to ship captioned video without rebuilding the pipeline? Get started with LiveAPI and add WebVTT captions to live streams and VOD with the same API call that delivers your video.
