HTML 5 Video, Text Tracks, and Audio Descriptions Made Easy (or at least easier)

First, HTML5 Video and Text Tracks

One of the really powerful features of HTML5 video is the ability to add in text tracks, like for captions and subtitles, and have them automatically play back synchronously with the video. The text tracks simply need to be in the WebVTT format, which is similar to the SRT format but with some additional functionality. Unfortunately, because the spec for WebVTT is so new, none of the browsers have implemented this functionality yet.

I wanted to be able to easily add text tracks (for a reason you will see later) to HTML5 videos. I found a jQuery script that would automatically parse and display SRT files that were included in the <text> element inside a <video> element. I took that concept and made modifications so I could

  1. add multiple tracks instead of just a single track
  2. changed where the output of the text tracks would appear
  3. added support for WebVTT files – well, at least a basic version of WebVTT

Here are the basics.

1. Add these two lines to the <head> section.

<script src=""></script>
<script src="html5texttrack.js"></script>

2. Define a video

<video id="vid" width="480" height="270" controls tabindex="0">
     <source src="myvideo.mp4" type="video/mp4" />
     <source src="myvideo.webm" type="video/webm" />
     <source src="myvideo.ogg" type="video/ogg" />

     <track kind="caption" src="captions.vtt" srclang=en label="English" />

     <!-- fallback goes here for browsers that do not support the video tag -->

3. Define an area to insert the captions

<div id="captionBar"></div>

4. Add this JavaScript after the <video> element

loadTextTrack({videoId:'vid', // the ID of the video tag
               kind:'caption', // the kind of track
               srclang:'en', // the language of the source file (optional)
               targetId:'captionBar'}); // the ID of the element into which to insert the timed text

5. Define some CSS

<style type="text/css">
#captionBar{width: 480px; position: absolute; top: 280px;padding: 3px 10px; text-align:  center; color:#fff;background-color:#000; font-family:  Helvetica,Arial,sans-serif; font-size: 0.9em; font-weight: bold;min-height:3.6em;}

Here is the resulting video.

video with visible captions

Download the Script

Download the html5texttrack.js file to include in your own projects.

An important note – this script does not support all the features or formatting of WebVTT. It supports just enough for me to accomplish this next step.

Now, on to Audio Descriptions

If you have never heard of audio descriptions think of the them as alternative text for video. They are a separate audio track in a video that describes important information conveyed only through the video track. Here are two samples of audio described video: a clip from the movie The Miracle Worker, and a clip from an animated version of Hamlet.

Providing audio descriptions for videos is a requirement in order to meet WCAG 2 Level AA conformance (Guideline 1.2.5) and is one of the options to meet Level A (Guideline 1.2.3).

Providing audio descriptions can be challenging for a number of reasons.

  1. It is very expensive to have third-party vendors produce them – significantly more expensive than making captions.
  2. Creating audio descriptions is an art form. One of the challenges is fitting the descriptions within the empty audio sections of the existing video, usually when the original video was never produced with keeping large empty slots available to insert additional content. Then you have to find the balance of how verbose to be and still keep within those blank spots.
  3. You usually have to know how to use a movie editor in order to insert the audio description track.

JavaScript + ARIA + Screen Readers = Easier Audio Descriptions

One potential solution to aid in creating audio descriptions is to leverage several technologies that are already present in Web browsers and in assistive technologies. What if we could just provide a time stamped text file with the audio descriptions that needed to be spoken, and we use JavaScript to parse the file, and use a text-to-speech (TTS) engine to convert that to actual speech? One solution would be to provide some time of application, like a Flash app or browser plug-in, to provide the TTS functionality, however, most people who need the audio descriptions already have a TTS application running on their computer – their screen reader. All we need to do is take the time stamped text file and synchronously display the audio descriptions in an ARIA Live region. That way the screen reader will read all of the audio descriptions as they change.

Here is the same video from earlier but using this technology to incorporate audio descriptions. Please note, the first audio description does not show up until about 37 seconds into the video.

video with visible captions and audio descriptions

A couple of things are being done here.

  1. There are two ARIA Live regions, one for the captions and one for the audio descriptions. The cations area is set to aria-live=”off” and the audio description area is set to aria-live=”assertive”.
  2. There is a simple form to toggle the audio description ARIA Live region from “assertive” to “off” so that the user can choose not to have their screen reader voice the audio description changes.

Here is the pertinent code.

<video id="vid" width="569" height="320" controls tabindex="0">
    <source src="../source/headingsmap/headingsmap2.mp4" type="video/mp4" />
    <source src="../source/headingsmap/headingsmap2.webm" type="video/webm" />
    <source src="../source/headingsmap/headingsmap2.ogv" type="video/ogg" />

    <!-- the track tag is not currently implemented in any browser -->
    <track kind="caption" src="../source/headingsmap/captions.vtt" srclang=en label="English" />
    <track kind="audiodescription" src="../source/headingsmap/audiodescriptions.vtt" srclang=en label="English" />
    <!-- fallback if browser does not support the video tag -->
    <p>This browser does not support HTML 5 Video and thus cannot demonstrate this technique.</p>

<div id="captions">
    <div aria-live="off" id="captionBar"></div>

<div id="audio_description">
    <h2>Audio Description</h2>
    <div aria-live="assertive" aria-relevant="text" id="descriptionBar"></div>
            <legend>Toggle Audio Description Voicing</legend>
            <input id="on" type="radio" name="onoff" checked="checked" value="on" onclick="javascript:$('#descriptionBar').attr('aria-live','assertive')" />
            <label for="on">On</label>
            <input id="off" type="radio" name="onoff" value="off" onclick="javascript:$('#descriptionBar').attr('aria-live','off')"/>
            <label for="off">Off</label>

<p><a href="../source/headingsmap/captions.vtt">Caption source file</a></p>
<p><a href="../source/headingsmap/audiodescriptions.vtt">Audio description source file</a></p>

<!-- calls the function to parse the text track and display it in the container -->
<script type="text/javascript">
    loadTextTrack({videoId:'vid',kind:'caption', srclang:'en',targetId:'captionBar'});

<!-- set the default value for the audio description voicing to on -->

If you don’t have a screen reader capable of voicing the audio descriptions for you, here are a couple of options.

  1. View a screen cast of JAWS reading the audio descriptions from this video at a fast voice rate.
  2. View a screen cast of JAWS reading the audio descriptions from this video at a slow voice rate.
  3. On Windows, download NVDA, a free and open source screen reader, and try it out for yourself.

This approach does have some shortcomings, like

  1. The developer does not know the speech rate at which the the user’s screen reader is set. This makes it impossible to ensure that an audio description will not overlap important parts of the original audio track.
  2. It requires the user to have a screen reader that supports ARIA Live regions, which not all do yet.

Even with these disadvantages this opens the door for making the process of creating and providing audio descriptions much more achievable without a major investment of money or technical skill development. Audio descriptions are one of the most often neglected aspects of making accessible Web content, so hopefully this will help make them more prevalent.

Where This Might Lead To?

One possibility of implementing audio descriptions this way is you could create a database of audio descriptions that could be dynamically pulled into videos anywhere on the Internet. Currently this script pulls up a plain text file for the audio description, but it would be fairly simple to make the source point to a database. Implementing something like this is similar to some of the early efforts behind collective captioning of YouTube videos, where anyone could provide captions but everyone would benefit.

A Tool to Help You Out – The Movie Time Bookmarklet

If you do want to create audio descriptions this way, one of the trickiest parts is lining up the start of the audio descriptions with the start of a blank space in the audio track. Here is a tool I have developed to more easily get exact time stamps from a movie. It is a bookmarklet, and to use it

  1. Go to and add the link on that page to your bookmarks. Note, you are not adding a bookmark to the page itself, but rather a bookmark to the link that says “Movie Time” on that page.
  2. Browse to any page with an HTML5 video tag, like either of the examples in this blog post.
  3. Click the bookmarklet from your browser’s bookmarks.
  4. Video player controls will appear above the video to help you line up precisely where you want a particular audio description to start.

A couple of very big disclaimers about the Movie Time Bookmarklet.

  1. This tool comes as is. So far I have only developed it far enough to handle what I need it to do.
  2. This tool will only work with HTML5 video tags.
  3. This tool will only work with the first HTML5 video tag on a page if more than one video tag exists.
  4. If you have any suggestions for the tool I’m happy to take them but cannot promise anything.

I do have some plans for this tool in the future, but I’m not sure when I’ll be able to get to them, so I decided to make it available now as it is.

This entry was posted in tools by Greg Kraus. Bookmark the permalink.

About Greg Kraus

I am the University IT Accessibility Coordinator at North Carolina State University. I provide leadership in creating an accessible IT infrastructure by consult on the accessibility of campus projects, working with developers and content creators, provide training, and helping set policy.

5 thoughts on “HTML 5 Video, Text Tracks, and Audio Descriptions Made Easy (or at least easier)

  1. Very cool proof of concept, Greg! Just to document an idea we talked about in person, one problem with closed audio description is that the description and program audio sometimes clash with one another. If we use recorded voice for description, we have some control over the volume of the recording, and media players can potentially use ducking to lower the volume of the program audio whenever audio description is present (JW Player is already experimenting with that functionality, as described on my own blog post:

    If we use text for description, we have no control over the user’s screen reader volume, and even more importantly, we have no control over the user’s rate of speech so there’s no guarantee that audio description will fit within a particular span of time. Therefore, this seems to be a perfect use case for extended audio description, which I think would be easy to implement in your demo: At the start time for a new description, pause the video, then start playing again at the end time. What do you think?

  2. Thanks Terry.

    I’ve thought about adding the extended audio description with pausing functionality as well. The limitation there is the control for video playback will have to reside outside of the built-in browser playback. Since, for the reasons you pointed out like not knowing the rate of speech, the browser playback has no way of knowing when the TTS action has completed. It would require writing some JavaScript to basically pause the video each time an new audio description occurs, then the user would have to initiate a “resume” or “continue” type action. In essence, every audio description becomes an extended audio description. If users don’t mind doing that it should be a fairly easy solution to set up.

    I agree that this is not the ideal solution, also for the reasons you mentioned. If browsers would support TTS natively, and we were able to make a callback or query when the TTS event is completed this would be even better. I have not tried doing this with the various browsers or browser plugins that enable TTS. Perhaps that can be a goal of mine this summer.

  3. Pingback: Accessibility in a Digital Age 1.5 - ProfHacker - The Chronicle of Higher Education

Comments are closed.