In a recent post, “Indigenous Protests, Wikileaks and Online Subtitles” I focused on the social and historical importance of adding subtitles to online video, especially as it relates to those who promote human rights and inclusive rural development. Most of the post is centered around a single video of an indigenous woman delivering a powerful and emotional indictment of Peruvian president Alan Garcia after the burial of her husband who was killed by soldiers. Her words were originally broadcast on television without any subtitles, incomprehensible to the vast majority of Peruvians. Thanks to the intervention of a Peruvian journalist, the video was eventually made available with Spanish subtitles, but until today it was not available in English:

This post focuses on the technology of subtitling online video: how it is done, how it has been improved, and what obstacles and opportunities lay in the road ahead as we move toward a world where information is made accessible to as many people as possible, irrespective of language, or hearing and sight impairment.

dotSUB and Universal Subtitles

There are currently two major online tools to subtitle videos, dotSUB and Universal Subtitles. dotSUB is a proprietary, commercial tool that is focused on providing subtitling and captioning solutions to commercial content producers. Their technology also powers TED’s Open Translation Project which enables volunteer translators to make TED videos available in nearly 80 different languages. In addition to providing enterprise solutions to corporate clients, dotSUB also has a fairly active volunteer translator community that uses the website to upload, caption, translate, and share videos that they are interested in. Universal Subtitles is a new tool that was just recently launched by the Participatory Culture Foundation, the group behind Miro, an open source podcasting and video client similar to iTunes. Universal Subtitles is an open source, Javascript-based widget that allows the subtitling of videos hosted on YouTube, Blip.tv, DailyMotion, and also self-hosted videos. Unlike dotSUB, which only has a Flash-based video player, Universal Subtitles also supports HTML5 so that subtitled videos can be viewed on devices that do not support Flash, such as the iPhone and iPad.

For the user deciding between the two tools there are a few important differences to keep in mind. As of this post, Universal Subtitles is still in beta and thus less stable than dotSUB, which has been around for several years now. On the other hand, Universal Subtitles offers several features that are not possible with dotSUB. Perhaps most importantly, dotSUB forces users to first caption a video in its original language before adding subtitles in another language. While the intention is to promote quality translations while the video is translated into other languages, it might discourage users who would like to make it accessible to speakers of another language without taking the time-consuming step of first captioning the video in its original language. Universal Subtitles allows you to instantly add subtitles in any language you would like. Also unlike dotSUB, Universal Subtitles does not require you to host the video on their website. Rather, they simply offer you a tool to place subtitles as a layer over any existing video that is hosted on YouTube, Blip.tv, or on your own website. This also implies an interesting legal distinction — a subtitled video is legally considered a derivative work, which means you must first secure the permission of the copyright holder. But one could argue that Universal Subtitles as a tool does not produce subtitled videos; rather they facilitate a way for viewers to lay a text file over an already existing video.

Most importantly, both tools are free to use and allow users to make online video accessible in more languages. Both tools allow you to download copies of the subtitles that you have produced in various formats, but dotSUB only allows the “owner” of the video to download the subtitles while Universal Subtitles gives all users access. Neither dotSUB nor Universal Subtitles allow you to download a copy of the video with embedded subtitles. In order to watch the video with subtitles offline, we will need to go through a few more steps.

The Workflow

I used both Universal Subtitles and dotSUb to add subtitles to the video. Renata Avila helped me with the captioning of the video in Spanish on dotSUB and I then translated it into English. On Universal Subtitles I simply typed in the English translation as I watched the video.

There are two types of subtitles, hard and soft. Hard subtitles are burned into the video, meaning that they always appear whenever you watch the video. Soft subtitles give you the option to watch the video with or without subtitles/captions (as we are used to when we watch DVD’s). Unfortunately most online video currently uses hard subtitles (for example, on Netflix you cannot turn subtitles on or off), but in fact many video formats including QuickTime, Divx, and OGG allow ways to embed soft subtitles that can be turned on or off in your video player.

In order to watch the above video with English and Spanish subtitles on my iPhone I use a program called iSubtitle. (Subler is another choice, an open source program that allows the embedding of subtitles into QuickTime files.) First I download the video from YouTube using Evom. Then I download the subtitles from either Universal Subtitles or dotSUB. I open iSubtitle and import both the video and the two subtitle files.

I can change the font size of the subtitles, adjust the timing, and change the metadata of the video file itself, which is especially handy for podcasters. Then I export the video with the subtitles embedded into the video file itself. Depending on the speed of your computer, a five minute video tends to take between 10 – 15 minutes to export. Now I must upload the video once again to my blog in order for my readers to download the subtitled version of the video so that they can later view it in either Spanish or English on their mobile devices. (I still don’t have a transcript of the video in the woman’s native Awajún language.) Here is a screenshot of what it looks like on an iPhone:

If you are subscribed to the podcast of my blog then you will automatically receive the video in iTunes where you can also view the subtitles.

From TV to Online Video – Two Steps Back

Earlier this month I was at the Open Subtitles Design Summit in New York City, which was hosted by Universal Subtitles and funded by Open Society Institute’s Information Program. (Disclosure: I am currently consulting for OSI’s Information and Latin America programs.) The meeting brought together around 40 technologists, translators, filmmakers, and accessibility advocates to discuss the current state of online subtitling. Much of the meeting focused on taking stock of current tools, techniques, and strategies to caption and subtitle online video.

Larry Goldberg of the National Center for Accessible Media at WGBH helped provide a historical account of the rise of captions and subtitles on television. According to Goldberg, captioning of television content was initially government funded. Later, costs were split between the producer, distributor, and sponsors of a particular program. It wasn’t until SAP came along to offer Spanish dubbing of English-language programming, says Goldberg, that content producers saw the value in making their shows available in more languages to reach a larger audience. Today nearly all content on major television networks in the United States is captioned, but only a very small fraction of online video is captioned and/or subtitled.1 That will likely soon change thanks to the passage of H.R. 3101, otherwise known as The Twenty-first Century Communications and Video Accessibility Act, which was passed into law just last week after years years of advocacy – much of which took place on Facebook. (A captioned video of President Obama signing the new law is available on YouTube.)

As Robert Goodwin writes in his post celebrating the bill’s passage, it will probably be some time until there is an explosion of captioned and subtitled video on the net. Not only must the Federal Communications Commission write the official regulations, but toolmakers must build captioning into their video players and content producers must develop workflows to publish captions along with videos. So far there are few guides as to how this should be done and little agreement about standards. A post published last year on the Netflix blog, for example, explains why it has taken them over a year to develop a captioning and subtitling solution for their online streaming service. Hulu, meanwhile, was one of the early online leaders in offering closed captioning but still only offers around 5% of content with captions, and that number seems to be shrinking.

Future Opportunities and Challenges

The wild west of online subtitling today presents both several opportunities and challenges for the future. Let’s start with the opportunities.

Why Pay for Free?

According to Larry Goldberg’s calculations, it currently costs mainstream video producers around $600 to caption 22 minutes of pre-produced sitcom programming. Given that there are typically 2.5 words per second in most sitcoms, it would cost around $330 per episode to translate the captions into another language at the reasonable market price of 10 cents per word. In practice, most video producers actually pay around $30 per minute of subtitling, which adds up to $660 per episode, just as much as the original captioning.

The grand irony is that existing subtitles already exist for the most popular English-language television programs in dozens of other languages. You can find many of them at OpenSubtitles.org. One user from Finland has uploaded no less than 2,754 Finnish subtitles this year alone to popular movies and television shows. It is ridiculous for TV and video producers to spend tens of thousands of dollars on captioning and subtitling when their fans are already doing it for free. Some smart entrepreneur will eventually convince them of this, and will probably make a good deal of money doing so. (For example, by charging these content producers for community management and quality assurance tools.)

Make Video Search Suck Less

Searching for video — or even worse, a particular scene in a video — is terrible. Google’s rollout of automatic captions in YouTube videos was not to make them more accessible (the quality isn’t good enough), but rather to improve searchability of video content. By having access to vast amounts of video captions (and not allowing Google to index those captions), a company could easily launch a video search engine that is better than what Google could offer. I can imagine a company like Blinkx paying a subscription fee for access to captions and subtitles of video.

Targeted (and more annoying) Advertising

According to Nielsen, online advertising fell overall in 2009, but ad spend on online videos grew 41%. Still, most advertising companies struggle to know where they should place their ads. Keyword analysis of caption and subtitle files could transform how advertisers place ads in online video. There was also some talk at the Open Subtitles Design Summit of creating hyperlinkable captions/subtitles to make video more interactive (and to fill it with spam links, of course).

Now let’s talk about some of the challenges.

Lack of Business Use Cases

Despite the clear opportunity to make money, most content producers need to be convinced that it is in their interest to caption and subtitle their videos. No rigorous research has yet measured how captions and subtitles are used online. For example, CNET TV is one of the few online-only video producers that caption all of their videos. What has been the effect? Does the bounce rate change? Has their been a social impact? Do their advertisers recognize value in the captions? Would their viewers be willing to voluntarily translate the captions into other languages?

The Quality Question

I have recently watched two subtitled videos. The first was the Garden State DVD with professionally produced Spanish-language subtitles and the second was a pirated, downloaded version of La Teta Asustada with amateur English-language subtitles from Open Subtitles. The amateur subtitles for La Teta Asustada were better quality than the professional subtitles included on the Garden State DVD (which were so bad they were funny). As workshop participants noted, it is one thing to make a small mistake on the subtitles of a Hollywood movie and quite another on an educational video for medical students. For everyone interested in best practices and quality assurance, a sub-group at the meeting has started a document on the event wiki, and I recommend Peter Kaufman’s newly published guide, “Video for Wikipedia and the Open Web: A Guide to Best Practices for Cultural Institutions.”

Helping Videos Find Subtitles

There is a good chance that subtitles already exist online for most major movies in most major languages. In fact, as an experiment I just searched for Spanish-language subtitles for the top eight most popular movies on The Pirate Bay and found them without any problems. (Just search the title of the movie followed by ‘subtitles spanish’.) In fact, often times there are multiple versions of subtitles for the same video. In Italy, there were competing volunteer teams to see who could subtitle the latest episode of Lost the fastest. What doesn’t exist is a service that notifies users when subtitles are available for a particular video. Fortunately, the team at Participatory Culture Foundation is already starting to work on such a system:

We’ve been toying with the following idea: you’re watching a video and you notice that Miro’s “subtitles” button is glowing. This means there are subtitles available in a language that you speak; clicking the button pops the subtitles over your video (holding the button displays all the different languages available and subtitle versions).

Much of the protocol could be modeled on the same system that notifies readers of a web page when it can be read as an RSS file. So far the closest such lookup tool, ironically, seems to only be available for the iPhone.

Brian McConnel of Worldwide Lexicon noted that such a lookup system should not just be limited to video content, but should also apply to text and audio. Just like a single video might be published 30 different places on the internet, a single Associated Press article is often re-published hundreds of times. If that article is translated into Portuguese just once, then I should be notified of the available translation no matter where I come across the article. I won’t get into all of the technical details of implementing such a lookup protocol, but again notes are available on the event wiki.

Easy Wins

The first thing we need is to get more people subtitling more videos. It’s a great way to improve your language skills. Go to YouTube, find a video you’re passionate about, and start translating it using Universal Subtitles. If you need suggestions, Atenco: Romper el Cerco is an important video about human rights abuses in Mexico that is available in Spanish and English, but should also be made available in other languages.

Al Jazeera is a great candidate to make use of Universal Subtitles. Already they publish all of their content under flexible Creative Commons licenses, but they are still burning hard subtitles into their videos rather than using soft subtitle overlays. In fact, their video player doesn’t even allow captions or subtitles. Last month they published a fascinating documentary on the resurgence of violence and displacement in Colombia, but sadly the video is not even accessible to Colombians themselves because there is no easy way to translate the subtitles into Spanish.

iTunes U is one of the most valuable and under-rated resources on the net. It boggles my mind that anyone pays for higher education with so many lectures available for free online. Why go to MIT for creative writing, for example, when you can download a video of a lecture by Junot Diaz to your iPod or TV. And, better yet, the video is captioned, as are thousands of others on iTunes U.2 Hopefully future versions of Universal Subtitles will automatically detect those embedded captions and enable users to make educational lectures available to speakers of other languages. iTunes U is too valuable of a resource to be limited to just English speakers.

  1. Caption Action 2 found only five major content providers that offer closed captioned content online — and seventy-seven that don’t. Only ABC, CNET, Fox, Hulu, and NBC offer cc content online.
  2. A guide to searching for captioned content from iTunes U is available at the University of South Florida website.