If you’re building a video product, text tracks aren’t optional. Most viewers now scroll feeds with the sound off, half of all U.S. adults use captions at least some of the time, and the U.S. Department of Justice’s April 2024 rule officially makes WCAG 2.1 Level AA the baseline for public-sector video accessibility — with the first compliance deadline already past for entities serving more than 50,000 people. That single rule changed how every engineering team should think about closed captioning vs subtitles.
The two formats look the same on screen. They are not the same in the code, the file, the law, or the user experience. Picking the wrong one means failing accessibility audits, alienating Deaf and hard-of-hearing viewers, or shipping a player that breaks on Safari.
This guide breaks down closed captioning vs subtitles for engineers and product teams — what each format actually contains, how they’re encoded, which file formats win on which platforms, what the law requires, and how to wire them into a working video app. By the end, you’ll know exactly which to ship and how.
What Is Closed Captioning?
Closed captioning is a text track that transcribes spoken dialogue and non-speech audio — sound effects, music cues, speaker identification, laughter, applause — so viewers who can’t hear the audio can still follow what’s happening on screen. “Closed” means the captions can be toggled on or off by the viewer; they’re carried as a separate track, not burned into the video.
Closed captions were designed in the early 1970s for Deaf and hard-of-hearing audiences in the United States. They assume the viewer cannot hear anything. That assumption drives every formatting decision: captions describe the audio environment, not just the words. A line of closed captioning looks like this:
[door slams]
JANE: I told you not to come here.
[ominous music swells]
Two technical standards dominate broadcast closed captioning: CEA-608 (the legacy analog standard, line 21 of the NTSC signal, 32-character lines, limited styling) and CEA-708 (the digital successor used on ATSC broadcasts, supports color, fonts, positioning, and Unicode). On the web, those broadcast formats are converted to text-track formats like WebVTT, SRT, or TTML for delivery inside an HLS streaming manifest or an HTML5 <track> element.
The federal rules around captions on U.S. broadcast television are strict: the FCC requires captions on most prerecorded and live programming and enforces accuracy, synchronization, completeness, and placement standards. Online video has its own rules under the 21st Century Communications and Video Accessibility Act (CVAA), and accessibility standards like WCAG 2.1 Success Criterion 1.2.2 require captions on all prerecorded synchronized media.
What Are Subtitles?
Subtitles are a text track that transcribes only the dialogue, typically translated from the original audio language into a target language. They were created in the 1930s for foreign film distribution, and they assume the viewer can hear the audio fine — they just can’t understand the language being spoken.
A subtitle file for a French film displayed in English looks like this:
Where have you been?
I was at the market.
We need to talk about your father.
No sound effects. No speaker labels. No [door slams]. Just translated dialogue, synchronized to the audio.
Subtitles serve a fundamentally different purpose than closed captions. They expand your audience across languages, support international content distribution, and help language learners follow along. They do not satisfy accessibility law — a video with English subtitles on English audio (sometimes called “same-language subtitles”) still fails WCAG 1.2.2 if it lacks non-speech audio information.
Modern streaming platforms blur this distinction in their UIs. Netflix labels everything as “subtitles” even when the file is technically SDH or closed captions. Vimeo and YouTube use both terms. Under the hood, though, the engineering decisions are different — and the legal consequences are too.
Closed Captioning vs Subtitles: Side-by-Side Comparison
Here’s how the two formats compare across the dimensions that matter when you’re shipping a video product:
| Attribute | Closed Captioning | Subtitles |
|---|---|---|
| Primary audience | Deaf and hard-of-hearing viewers | Hearing viewers who don’t speak the source language |
| Assumes viewer can hear | No | Yes |
| Content | Dialogue + sound effects + speaker IDs + music cues | Dialogue only |
| Same language as audio? | Usually yes | Usually no (translation) |
| Toggleable | Yes (closed); can also be burned in (open) | Yes (soft); can also be burned in (hard/forced) |
| Legal requirement | Required by FCC, ADA, WCAG, CVAA, EAA | Not required for accessibility |
| Broadcast standard | CEA-608, CEA-708 | None — distribution-format dependent |
| Common file formats | SCC, MCC, CAP, WebVTT, TTML, IMSC1 | SRT, WebVTT, SUB, ASS |
| Display style | Often white text on black box | Often white or yellow text with shadow |
| Origin | 1970s, U.S. accessibility | 1930s, European film translation |
| Character limit per line | ~32 characters (608); flexible (708) | Up to 42 characters |
The cleanest way to remember the split: captions are about accessibility, subtitles are about translation. Captions describe sound; subtitles translate language. Anytime an engineering decision depends on which one you’re shipping, that single sentence answers it.
Open Captions vs Closed Captions vs Subtitles
The “closed” in closed captioning describes how the text is delivered, not what it contains. There are three delivery modes you’ll run into when building a video pipeline:
Closed Captions (CC)
Closed captions ride as a separate text track alongside the video. The player reads the track and renders the text at playback time. The viewer can toggle them on, off, change font size, swap color schemes, or reposition the text. This is the format you want for almost every modern app — it preserves user choice and keeps file sizes lean.
Open Captions
Open captions are burned directly into the video pixels during encoding. They cannot be turned off. The text is part of the picture, like a hardcoded watermark. Open captions show up on every device without any player support, which makes them ideal for platforms that strip metadata or text tracks (Instagram Reels, TikTok auto-play, embedded shorts on Twitter/X). The trade-off: viewers can’t disable them, you can’t translate them later, and every language version requires a separate encode.
Subtitles
Subtitles also come in soft (separate track, toggleable) and hard (burned in, sometimes called “forced narrative” when used only for foreign-language segments). The same delivery split applies — the only difference from captions is what’s in the text track itself.
Here’s a quick decision matrix:
| Use Case | Recommended Format |
|---|---|
| Compliance with ADA / WCAG / EAA | Closed captions (soft) |
| Social autoplay (Reels, TikTok, X) | Open captions (burned-in) |
| International streaming (single source, multi-language) | Soft subtitles (multiple language tracks) |
| Theatrical or fixed-screen distribution | Open captions |
| Live event with simultaneous translation | Soft subtitles, multiple tracks |
| Live event for D/deaf audience | Closed captions, live-generated |
The closed-vs-open decision and the captions-vs-subtitles decision are orthogonal. You can have closed subtitles, open subtitles, closed captions, or open captions — four combinations, four use cases.
SDH Subtitles: Where Captions and Subtitles Overlap
SDH stands for Subtitles for the Deaf and Hard of Hearing. It’s the format that confuses everyone because it sits exactly between closed captioning and traditional subtitles.
SDH includes everything a closed caption track does — dialogue, sound effects, speaker identification, music cues — but it’s encoded as a subtitle file (typically WebVTT, SRT, or a bitmap-based PGS track on Blu-ray) rather than as a CEA-608/708 broadcast caption. SDH exists for a practical reason: streaming platforms and physical media often don’t carry broadcast caption tracks, so the industry adapted the subtitle delivery format to carry accessibility information.
If you see a Netflix video labeled “English [CC]” — that’s almost always SDH, not a true CEA-708 closed caption track. The information is the same; the file format is different.
For engineering teams, the takeaway is simple: when you need accessibility compliance on a modern web or mobile streaming player, you’re shipping SDH-style WebVTT, not 608/708. The terminology in the marketing UI says “closed captions” — the bytes on disk are subtitle files with accessibility content. Both things can be true at once.
How Closed Captions and Subtitles Work Technically
Behind every text track on screen is a pipeline that turns audio into timed text and routes it to the player. Here’s how the flow works on a modern HTTP video stack:
- Authoring: A human captioner, an ASR engine, or a hybrid workflow generates timed text. Time codes mark when each cue appears and disappears.
- File format selection: The authoring tool exports to a delivery format (WebVTT for web, TTML/IMSC1 for OTT, SRT for general use).
- Packaging: The text file is bundled into the streaming manifest. In HLS, that’s a
#EXT-X-MEDIA:TYPE=SUBTITLESentry pointing to a WebVTT playlist. In DASH, it’s a<Representation>inside an<AdaptationSet mimeType="text/vtt">. - Delivery: The player fetches the manifest, sees the available subtitle tracks, and downloads cues just-in-time as playback progresses.
- Rendering: The player parses cues and overlays them on the video element, respecting positioning, styling, and user preferences.
This is where format choice gets technical. WebVTT supports CSS styling, positioning (line, position, align), regions, and ruby annotations — everything you need to render proper closed captions inside an HLS stream. SRT cannot. If you’re building a player that needs to render captions correctly on web, mobile, and smart TVs, WebVTT is the path of least resistance.
For live events, the pipeline gets tighter. Live captions come from real-time stenographers, respeakers (a captioner speaks the audio cleanly into an ASR engine), or fully automated ASR systems. The captioned text is injected into the live HLS or CMAF packaging in near real time, typically with a 3-10 second delay relative to the video. Choosing the right live streaming encoder and packaging path determines how that latency budget plays out.
Caption and Subtitle File Formats: SRT, WebVTT, TTML, IMSC1
The file format you pick depends on where the content is going to play. Here’s a developer-focused breakdown:
SRT (SubRip Subtitle)
The oldest and simplest format. Plain text, cue numbers, time codes, dialogue. No styling support, no positioning, no metadata. SRT is universally readable — every player on every platform handles it — but it’s also the least capable. YouTube, Facebook, and most editing tools accept it as input.
1
00:00:01,000 --> 00:00:03,500
Where have you been?
2
00:00:04,000 --> 00:00:06,200
I was at the market.
WebVTT (Web Video Text Tracks)
The W3C-standardized format for HTML5 video. Based on SRT but adds styling via CSS, cue positioning, voice tags for speaker IDs, regions, and chapters. WebVTT is what you serve inside an HLS manifest for web and mobile playback. Most modern players — Video.js, Shaka Player, hls.js, AVPlayer — render WebVTT natively.
WEBVTT
00:00:01.000 --> 00:00:03.500 line:90%
<v Jane>Where have you been?</v>
00:00:04.000 --> 00:00:06.200
<v Mark>I was at the market.</v>
[door slams in distance]
TTML (Timed Text Markup Language) and IMSC1
XML-based formats used heavily in OTT and broadcast distribution. TTML supports complex styling, multiple languages in one file, and frame-accurate timing. IMSC1 is a constrained TTML profile that Netflix, Amazon, and Apple require for final delivery. If you’re shipping to a major OTT distributor, you’re producing IMSC1.
SCC, MCC, CAP
Broadcast-native caption formats — used in TV master files and post-production workflows. You’ll convert these to WebVTT or IMSC1 before delivering to a streaming pipeline.
Format selection cheat sheet
| Destination | Use this format |
|---|---|
| HLS playback on web/mobile | WebVTT |
| DASH playback | WebVTT or TTML |
| OTT delivery to Netflix/Amazon | IMSC1 |
| YouTube/Facebook upload | SRT or WebVTT |
| Broadcast TV master | SCC or MCC |
| Quick draft / human review | SRT |
For a typical app built on top of a video hosting API, WebVTT is the right default. It plays everywhere, supports the styling you need for proper accessibility, and packages cleanly into HLS manifests.
Legal and Accessibility Requirements
The legal landscape for video captions has gotten dramatically stricter in the past two years. If you’re shipping video to U.S. or EU users, here’s what applies:
ADA Title II and Title III (United States)
The Department of Justice’s April 2024 final rule explicitly adopts WCAG 2.1 Level AA as the technical standard for ADA compliance. Title II covers state and local governments; Title III covers places of public accommodation, which courts have repeatedly extended to private business websites. The first Title II compliance deadline (entities serving populations over 50,000) was April 24, 2026.
WCAG 2.1 / 2.2 Success Criteria
- 1.2.2 Captions (Prerecorded) — Level A. Requires captions for all prerecorded audio content in synchronized media. Subtitles alone don’t satisfy this.
- 1.2.4 Captions (Live) — Level AA. Requires captions for all live audio content in synchronized media.
- 1.2.5 Audio Description (Prerecorded) — Level AA. Audio description for prerecorded video.
The W3C documents the precise success criterion for captions including the difference between captions (required) and same-language subtitles (insufficient).
CVAA (United States)
The 21st Century Communications and Video Accessibility Act requires online programming that previously aired on U.S. broadcast TV with captions to also carry captions when delivered online. Pure online-only content isn’t covered, but most ad-supported and subscription streamers comply across their catalogs.
European Accessibility Act (EU)
The EAA went into force June 28, 2025 across EU member states. It requires audiovisual media services, e-commerce sites with video, e-readers, and video-conferencing platforms to provide accessible content — meaning captions for prerecorded video and live captioning for live streams. Penalties vary by member state but include fines up to €1 million in some jurisdictions.
FCC Closed Captioning Rules
For broadcast TV in the United States, the FCC mandates captions on virtually all programming and enforces four quality standards: accuracy, synchronicity, completeness, and proper placement. The rules apply to programming delivered online if it previously aired on television.
The summary: if your product distributes video to a meaningful audience, closed captions are mandatory, not optional. Subtitles do not satisfy any of the above. Build your video pipeline around the assumption that every piece of content will eventually need a proper closed caption track.
Closed Captioning vs Subtitles: When to Use Each
Once you understand the legal floor, the design decision becomes practical. Here’s a framework for which format to ship in which scenario:
Ship closed captions when…
- Your audience includes Deaf or hard-of-hearing users (assume it does — about 15% of U.S. adults report some degree of hearing loss)
- You need ADA, WCAG, CVAA, or EAA compliance
- You’re publishing educational, training, or workplace content
- You’re shipping prerecorded video on a website, mobile app, or OTT platform
- The audio includes meaningful non-speech information (alarms, music cues, off-screen action)
- Your content gets indexed for search — caption text is crawled and helps SEO
Ship subtitles when…
- Your content is being distributed to non-native speakers of the source language
- You’re producing localized versions for international markets
- Your audience speaks a different language than your audio
- You’re targeting language learners (same-language and target-language tracks are common)
Ship both when…
- You’re distributing to international audiences who include Deaf and hard-of-hearing viewers (the norm for major streaming services)
- Your source content includes scenes in multiple languages and you need both translation and accessibility
Ship open captions (burned-in) when…
- Distributing on platforms that auto-play muted or strip text tracks (Instagram Reels, TikTok, X autoplay)
- You need captions to survive re-encoding, screenshotting, or downstream distribution
- You’re producing a single fixed cut without per-viewer customization
In most production systems, the right architecture is: soft closed captions as the default, with optional burned-in open captions for social distribution clips. Generate the WebVTT once and reuse it across delivery channels.
How to Add Captions and Subtitles to Your Video App
The hard part of caption support isn’t the captions themselves — it’s wiring them through your pipeline so they survive ingest, packaging, and playback. Here’s the practical path for a typical web/mobile video product.
Step 1: Generate the timed text
Three approaches:
- Human captioning: A captioner watches the video and produces a timed transcript. Accurate, expensive, slow. Use for hero content and compliance-critical material.
- ASR (automatic speech recognition): A speech-to-text engine generates the transcript automatically. Fast, cheap, less accurate. Quality has gotten much better since 2024 — modern ASR hits 90-95% word accuracy on clean studio audio.
- Hybrid: ASR-generated draft, human-reviewed and corrected. This is the workflow most production teams use.
For broadcast or regulated content, FCC and WCAG quality standards effectively require human review.
Step 2: Export to WebVTT
Whatever tool you use, export the captions as WebVTT. Make sure the file includes:
- A
WEBVTTheader - Cue identifiers (optional but useful for debugging)
- Properly formatted time codes (
HH:MM:SS.mmm) - Speaker tags (
<v Speaker Name>) for accessibility - Non-speech audio cues in brackets (
[music swells])
Step 3: Package into your HLS manifest
If you’re using HLS (which you almost certainly are for mobile and OTT), the captions go in as a subtitle media playlist referenced from the master playlist:
#EXTM3U
#EXT-X-VERSION:6
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",
DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="en.m3u8"
#EXT-X-STREAM-INF:BANDWIDTH=2000000,CODECS="avc1.4d401f,mp4a.40.2",
SUBTITLES="subs"
video.m3u8
The en.m3u8 is a subtitle playlist pointing to one or more WebVTT segments. Read more about the manifest structure in our guide to the .m3u8 file.
Step 4: Configure your player
Most modern HTML5 players auto-detect subtitle tracks in the HLS manifest. The browser exposes them via the TextTrack API:
const video = document.querySelector('video');
const tracks = video.textTracks;
for (const track of tracks) {
if (track.kind === 'captions' && track.language === 'en') {
track.mode = 'showing';
}
}
For React, use the native <track> element or a wrapper like React video player libraries:
<video controls>
<source src="/master.m3u8" type="application/x-mpegURL" />
<track
kind="captions"
src="/captions/en.vtt"
srcLang="en"
label="English"
default
/>
</video>
The kind="captions" attribute is the explicit signal that the track contains accessibility-grade closed captions, not generic subtitles. Use kind="subtitles" for translation tracks. The distinction matters to assistive tech.
Step 5: Test on real devices
Caption rendering varies dramatically across players, OSes, and devices. Test:
- Safari on iOS and macOS (uses native AVPlayer rendering)
- Chrome, Firefox, Edge on desktop
- Android Chrome and the WebView
- Smart TV players (Roku, Tizen, webOS)
- Accessibility tools (VoiceOver, TalkBack, screen readers)
Pay attention to: line breaks (does a long line wrap correctly?), positioning (does the text collide with on-screen UI?), styling (do user preferences for font size and color get respected?), and timing drift on live streams.
Essential Tools and Infrastructure for Captions and Subtitles
Shipping captions at scale requires a video stack that handles ingest, transcoding, packaging, delivery, and playback without dropping the text tracks at any step. Here’s what you need:
Captioning service or ASR engine
Options range from human-only services (3Play, Rev) to ASR providers (AssemblyAI, Deepgram, Whisper) to hybrid platforms. Pick based on accuracy requirements and budget.
Encoding and packaging
Your encoder needs to preserve text tracks through transcoding and pass them into HLS or DASH manifests. Most modern cloud video APIs handle this automatically, but if you’re rolling your own pipeline with FFmpeg, the -map and -c:s flags control how subtitle streams flow through transcoding. For background on the encoding side, see our video encoding guide.
Streaming and delivery
Captions ride alongside the video as separate manifest entries. Your origin and CDN need to serve both. Most production setups use the same CDN for video segments and subtitle segments; just make sure the CDN doesn’t strip text/* MIME types or apply weird caching to small VTT files.
Player
The player is where caption support actually shows up to the user. Native HTML5 <track> works for simple cases. For production-grade rendering — including styled WebVTT, multiple language switching, user preferences for caption appearance — use a managed video player API or open-source library (Video.js, Shaka, hls.js).
Workflow management
For larger libraries, you’ll need a system to track which assets have captions in which languages, who reviewed them, and when they were last updated. Caption assets are first-class content — manage them like translations or thumbnails, not as afterthoughts.
LiveAPI provides the encoding, packaging, HLS delivery, and embeddable player that this kind of pipeline needs. Upload a video, attach a WebVTT track, and the platform packages it into HLS with the correct #EXT-X-MEDIA entries for subtitle tracks. The embeddable player surfaces those tracks to viewers with proper accessibility metadata. For live events, captions can be injected into the live HLS output in near real time. Pair that with a captioning service of your choice and you have a compliant accessibility pipeline without writing your own packager.
Is Closed Captioning or Subtitling Right for Your Project?
Quick self-assessment. If you answer “yes” to any of these, you need closed captions:
- Do you sell to U.S. government, education, or large enterprise buyers? (They’ll demand WCAG/ADA proof.)
- Do you serve users in the EU? (EAA applies.)
- Does your video include meaningful non-dialogue sounds?
- Do you want your content indexed by search engines for the spoken text?
- Are you ever going to be sued? (Captions remove a common accessibility complaint vector.)
If you answer “yes” to any of these, you also need subtitles:
- Do you serve users who speak a different language than your source audio?
- Are you launching in new geographic markets?
- Do international users make up a meaningful share of viewing time?
Most production video products end up needing both. Start by building the pipeline to support closed captions in WebVTT — adding subtitle tracks later is the easy part once the plumbing exists.
Closed Captioning vs Subtitles FAQ
Are closed captions and subtitles the same thing?
No. Closed captions include dialogue, sound effects, speaker identification, and music cues, and they’re designed for viewers who can’t hear the audio. Subtitles include only translated dialogue and are designed for viewers who can hear the audio but don’t understand the language. The accessibility law treats them differently — captions satisfy WCAG and ADA, subtitles do not.
Do subtitles satisfy ADA or WCAG requirements?
No. WCAG 2.1 Success Criterion 1.2.2 requires captions, not subtitles, and the U.S. Department of Justice’s 2024 ADA rule explicitly references WCAG 2.1 Level AA. A video with English subtitles on English audio still fails accessibility audits because it omits non-speech audio information that Deaf and hard-of-hearing viewers need.
What’s the difference between SDH and closed captions?
SDH (Subtitles for the Deaf and Hard of Hearing) carries the same accessibility content as closed captions — dialogue, sound effects, speaker IDs, music cues — but it’s delivered as a subtitle file (WebVTT, SRT, PGS) rather than as a CEA-608/708 broadcast caption track. On modern streaming platforms, what users see labeled “closed captions” is almost always SDH under the hood.
What’s the best file format for captions on a web video player?
WebVTT. It’s the W3C standard for HTML5 video, supports the styling and positioning needed for accessibility, packages cleanly into HLS and DASH manifests, and is supported by every modern player. SRT is simpler and more universally readable, but it doesn’t support styling. Use SRT for ingest and authoring, WebVTT for delivery.
Can I burn captions into the video instead of using a separate track?
Yes — that’s called open captions. Open captions are baked into the video pixels during encoding and can’t be turned off. They’re useful for social autoplay (Instagram Reels, TikTok), where text tracks get stripped, but they prevent per-viewer customization and require separate encodes for each language. Most production systems ship soft closed captions as the default and only generate open captions for specific social distribution cuts.
Does YouTube use closed captions or subtitles?
Both. YouTube’s auto-generated captions and manually uploaded SRT/WebVTT tracks can serve either function — the platform labels them “subtitles/CC” in the UI without distinguishing. For ADA compliance on YouTube-embedded video, upload a human-reviewed caption track that includes non-speech audio information. The auto-generated track alone usually does not meet WCAG accuracy standards.
How accurate do captions need to be to be legally compliant?
The FCC’s broadcast caption rules require captions to be accurate, synchronous, complete, and properly placed. WCAG and ADA don’t set a numeric accuracy floor, but court rulings have repeatedly found ASR-only captions inadequate when error rates affect comprehension. A practical floor is 99% word accuracy with proper speaker IDs and sound effects.
How much do closed captions cost?
Human captioning runs roughly $1-$15 per minute of video depending on turnaround time, accuracy guarantees, and language. ASR-only services are $0.05-$0.50 per minute. Hybrid services (ASR + human review) fall between, typically $1-$5 per minute. For high-volume libraries, hybrid workflows hit the best cost-to-quality ratio.
Do live streams need captions too?
Yes — WCAG 2.1 Level AA requires captions on live synchronized media (Success Criterion 1.2.4), and the FCC requires them on most live broadcast programming. Live captions come from real-time stenographers, respeakers, or automated ASR, with a 3-10 second latency budget built into the streaming pipeline. Modern live captioning is wired directly into HLS or low-latency streaming workflows.
Will captions affect my video SEO?
Yes — Google indexes caption text on YouTube and via VideoObject schema markup. Accurate captions improve discoverability, dwell time, and accessibility ratings, all of which feed into ranking signals. For video-heavy content sites, caption text is one of the cheapest ways to expand searchable surface area.
Ship Captions and Subtitles Without Building the Plumbing
Closed captioning vs subtitles isn’t a stylistic choice — it’s a technical, legal, and audience decision baked into how you build the video pipeline. Captions cover accessibility and non-speech audio. Subtitles cover language translation. Both can be soft or burned in, and most production products ship some of each.
The hard part isn’t choosing the format. It’s encoding, packaging, and delivering the text tracks alongside the video reliably across every player and device. That’s where a managed video API saves months of work. A managed live streaming API handles upload, instant encoding, HLS packaging, multi-CDN delivery (Akamai, Cloudflare, Fastly), and an embeddable player that respects user accessibility preferences out of the box. Attach a WebVTT track, ship to web, mobile, and TV, and meet WCAG 2.1 AA without writing a custom packager.
Get started with LiveAPI to add captioned video streaming to your app in days, not months.