Creating captions is a three step process.

  1. create a text transcript of the audio portion of the multimedia
  2. add time stamps into the transcript to set when each text snippet will display on the screen
  3. incorporate the time stamped transcript back into the multimedia

Create a Text Transcript

Creating a transcript can be more of an art form than a science at times. The guidelines to remember are

  • include all relevant dialogue
  • include relevant non-verbal sounds
  • when appropriate, include the name of the speaker

If you want to create a transcript yourself, all you need is a basic text editor. If you are not a trained transcriptionist, this process will take longer than you think but for shorter videos is a very manageable task. You can also hire transcription services for this. Typical rates vary from $60 to $100 per hour of video/audio. For best practices in creating transcripts and captions, follow the links at the end of this document for Best Practices.

A sample transcript will look like the following.

Welcome to this video tutorial on creating captions. We will be learning how to create a transcript and add time stamps to the transcript to create a caption file.

Often people want to use speech recognition software to do this. Except for rare occasions, speech recognition software is not able to produce a transcript accurate enough to be used without significant corrections. It takes longer to correct transcription mistakes than you think. Studies have shown that it is usually cheaper to just pay for a transcription service from the beginning rather than pay for the time it takes to correct a computer generated transcript.

Creating the Time Stamped Transcript

This process can be done manually with specialized software or it can purchased through third-party vendors. Time stamped transcripts come in a variety of formats, but they all basically do the same thing. They tell when each piece of text is supposed to display on the screen. The end result will be something like the following.

00:00:00,000 --> 00:00:02,500
Welcome to this video tutorial on creating captions.

00:00:02,500 --> 00:00:4,000
We will be learning how to create a transcript

00:00:04,000 --> 00:00:07,000
and add time stamps to the transcript to create a caption file.

It is important to get your time stamped transcript in a format that your media player can handle. Most captioning software and services can export to a variety of formats.

For Web-based video, WebVTT is the new standard, especially for HTML5 video.

Incorporating The Caption File into the Multimedia

This process is as varied as the number of multimedia formats and players that are out there. If your multimedia is being delivered through a media delivery system (e.g. Panopto, Kaltura) often all you have to do is upload the caption file to the server. If you are using a custom media player, you will need to upload the caption file to the same location as your video and tell the player, often through a configuration file, that the caption file is present.


If your video is delivered through YouTube all you need to do is provide the transcript of your video. YouTube has the technology built into it to automatically synchronize your transcript with the video. Google provides additional information for captioning videos on YouTube.

YouTube also has technology to automatically create captions by using speech recognition technology, but again, the results will not be accurate enough to be used as captions without first doing a lot of editing.

Further Reading