Provide text equivalents for audio-only and audio content in video

All video and audio content with speech should have accessible captions that are: Synchronised to appear at the same time as the corresponding audio; Equivalent to the spoken words and other audio information; Accessible, or readily available, to those who need it.

Consider the environments in which users consume content that has an audio component, whether that is video or audio-only content. It is not always possible or appropriate to listen to sound, especially when: in a shared space; with others who are watching TV; when it is not comfortable to wear headphones; in a room with a sleeping baby. Captions and transcripts provide a better experience for everyone, allowing users to listen to audio content, and to watch and understand video with spoken content, without the need to be able to hear it.

On this page

Techniques

Captions provide the best experience for video with spoken content

For video with spoken content, captions provide the best user experience.

The most common type of captions are “Closed” captions – these can be turned on and off by the user. Open captions are part of the original broadcast of the media.

You must consider the placement of open captions so that they do not obscure any text or other important visual content.

YouTube provides automatic captioning, however this is not always entirely accurate so it is recommended that you produce your own captions.

Ensure that all information relevant to the interpretation of the audio is included, for example, important non-verbal sounds or changes in intonation or emotion.

Read A guide to using subtitles, captions and transcripts for accessibility, and see the WAI guidance on Captions/Subtitles.

Provide a text transcript for audio content

A transcript is a presentation of dialogue and any non-spoken audio content as text. If the audio has synchronised video content, provide a descriptive transcript that also includes a text description of the visual content – details of a character’s body language, expressions and movements, scene changes and on-screen text. Descriptive transcripts are required to make video content accessible to users with both visual and hearing impairments.

For content that is audio only, a transcript will usually be enough – captions are not necessary for audio-only media like podcasts.

Make this available through a link or as on-page content, near to the embedded video. Provide timings alongside every line to allow users to skip forwards and backwards whilst following the transcript – see example.

Ensure all information relevant to the interpretation of the audio is included.

Transcripts make multimedia content searchable by search engines and users.

Captions are important for live content

Although the provision of live captions is not mandated by the EU Web Accessibility Directive, we strongly recommend providing them, to allow users with hearing impairments to understand live audio content.

Ordinarily, it will not be possible to provide a transcript that is precisely synchronised with its visual content. For live content with an audio component, provide captions.

Code text equivalents so that they are close to the original audio content

If others are responsible for creating and entering text equivalents for their audio content, speak to them to understand their requirements and make provision for their needs. Ensure any linked or on-page content is easy to find and is in close proximity to its audio content.

Provide an optimal experience for users with hearing impairments

For users with hearing impairments, include both synchronised captions and a transcript.

Incorporate Irish Sign Language into videos with audio content

Irish Sign Language (ISL) is the native language of the Deaf community in Ireland and the primary means of communication. The Irish Sign Language Act 2017 recognises ISL as an official language of the Irish State and places a “duty on all public bodies to provide Irish Sign Language users with free interpretation when availing of or seeking to access statutory entitlements and services”.

When filming ISL videos it is important to consider lighting, background (blue- or green-screen) and clothing. The video should be captured at 30 frames per second, auto-focus should be disabled and the shot should be composed so that the person fills as much of the shot as possible.

See the Sign Language Interpreting guidelines for detailed information.

Examples of good practice

Screenshot of a podcast webpage. The podcast player is at the top fo the screen. There is a menu on the left of the page with links to 'Article', 'Video' and 'Transcript'. — Figure 1

The 'transcript' link in the left-hand menu scrolls you to the bottom of the page, where there is a full-text transcript of the podcast.

Screenshot of a podcast webpage. The podcast player is at the top of the screen. Below it is a heading of "Transcript". A script starts beneath the heading. — Figure 2

The full-text transcript of the podcast is included just after the podcast player.

Descriptive transcripts with timings allow users to follow the accompanying video content

0:00:00 – 0:00:05

This video has many close up shots of an event booking system called EventDiary. It begins by fading in the sound of a large group of people chattering. A light jazz piano melody fades in.

0:00:05 – 0:00:13

We hear Christine White speak: "It’s incredible that everybody made it today. Having real-time feedback about the number of attendees has really helped me plan everything from catering to gift bags."

On screen text reads: "Christine White – Organiser of the Digital Conference 2021"

A close up shot of a participant looking at a streaming conference fades onto the screen. We transition to a sliding shot of a group of participants at lunch, during a conference.

0:00:15 – 0:00:10

The piano melody and chattering crowd fade to silence. A close up shot of Alan’s face is now in view.

Squire: "Hi, my name is Alan Squire from EventDiary"

On screen text reads: “Alan Squire – Product Manager at EventDiary"

0:00:14 – 0:00:20

Squire: “I’m going to show you how to book a live streaming event using EventDiary”

0:00:21 – 0:00:26

“Here is an example of a straightforward event that we might want to start promoting on our website.”

A close up shot of a poster for a streaming conference fades onto the screen.

0:00:27 – 0:00:29

“It’s a webinar on accessibility.”

0:00:30 – 0:00:32

“The first things we need to note are”

0:00:32 – 0:00:38

“the date that it’s happening and the video platform we’re going to use.”

The shot pans across the poster, highlighting the date.

…

Video

References

WCAG 2.1

1.2.1 Audio-only and Video-only (Pre-recorded) (A)
1.2.2 Captions (Pre-recorded) (A)
1.2.4 Captions (Live) (AA)
1.2.6 Sign Language (Pre-recorded) (AAA)

EN 301 549 v 2.1.2

9.1.2.1 Audio-only and Video-only (Pre-recorded)
9.1.2.2 Captions (Pre-recorded)
9.1.2.4 Captions (Live)