[Series] Open Software, Open Content, Open Translation Part V

In the last post of this series, we looked at how producers of open source software and open content are usually faced with three types of translation: software, website interface, and content. This post will take a closer look at some of the processes and obstacles regarding the third type, content. As an example, I’ll translate a blog post by the Daniel Sempértegui of Cochabamba, Bolivia. This is what I would frequently do when I was Latin America Editor at Global Voices. I would read a lot of Latin American blogs in Spanish, find a couple interesting posts, translate them into English, and publish them on the GV site. The blog post I’ll translate today is titled Denuncias de autos robados en twitter or “Reports of stolen cars on twitter.”

Copyright, Translation, and Derivative Works

The first thing I do is look for what kind of license Daniel publishes his content under. In this case, he uses an attribution, non-commercial, no derivatives Creative Commons license. Now, here’s where the law gets tricky. Section 101 of the US Copyright Act defines a “derivative work” as “a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” Ironically, if a webpage (or anything else) is translated with Google Translate, it is considered fair use whereas if a human translates the same piece of content, it’s copyright violation. John Sieman of UNC’s school of law explains the four factors behind why Google Translate’s machine translation is considered fair use.

In this case, US copyright law makes it legal for Google to offer faulty machine translations, but illegal for non-commercial, volunteer translators to translate content out of good will. However, what jurisdiction of law should I be bound by? Colombia, where I am translating from? The U.S. where my server is? Or Boliiva where the content (we assume) was originally written and published? These are the confusing issues that the World Trade Organization tries to work out. And, as Georgia recently pointed out, their rulings are often made on a case by case basis. But while Antigua wins ‘piracy rights’, a French 16-year-old spent a night in jail for translating Harry Potter and, in Poland, clandestine movie sub-titlers face two years in jail for daring to make movies accessible in more languages.

(As an aside, I find it strange that Lorelle – who actually allows derivative works under the CC license she publishes under – doesn’t want others to translate her content, but does link to faulty Google translation versions of each of her posts.)

But, back to Daniel. Because he doesn’t allow for derivative works, I must send him an email asking permission to translate his post, Denuncias de autos robados en twitter. So I go to his contact page and find his email (Luke @ Soy Tu Padre dot Com!!!) and ask for his permission:

Hola Aeromental,

Me gustaría traducir tu post “Denuncias de autos robados en twitter” pero como tu licencia de Creative Commons no permite obras derivadas, necesito tu permiso.

Publicaría la traducción en mi blog personal (sin lucros o anuncios), http://el-oso.net/blog.

Espero que tengas una muy feliz navidad.



I’ve never once been turned down by any blogger whose post I proposed to translate (in fact, usually they are grateful and find it strange that I even felt compelled to ask their permission), but it’s better to be safe than sorry. If Daniel doesn’t write me back, then it’s my personal policy to go ahead and translate his post anyway. Sure, it’s against the law, but if he asks me to take it down, I’ll gladly do so. In this case, he responded within an hour:

Hola David,

Con mucho gusto puedes traducir este post u otros posts con la única condición de que pongas un link al post original de Aeromental indicando que esa es la fuente original. (Source: …., etc)

Un saludo y suerte con tu blog, desde ya me gustó mucho el background de uvas/viñedo que tiene.

Saludos !!!

The Actual Translation

Every content translator has his or her own process. Here is how I do it. First, I go to the original permalink of the post. Then, in my bookmark bar, I have Google’s Spanish to English translation browser button. I simply click this bookmark and a new browser page pops up with an automated machine translation of the page. Up until a couple months ago, all online translators used the same translation engine, SYSTRAN. But in October, Google Translate split from SYSTRAN and now uses it’s own custom statistical machine translation engine. One of the new features they introduced allows for users to suggestion improvements to the translation. When you hover your mouse over any paragraph of the automated translation, a javascript bubble will pop up with the original Spanish text and with the option to suggest an improved translation.

(Another aside: my guess is that it was this feature that was responsible for the ‘sarkozy sarkozy sarkozy = Blair defends Bush’ blooper.)

Now on my screen I have two browser windows: one with the original Spanish text and one with the machine automated translation. I open my blog editor and get ready to start typing away. I almost work completely from the original Spanish text. After finishing a paragraph I will look at the paragraph of the Google translation to see if there are any changes in definition or syntax that I’d like to make. Not to toot my own horn, but it is very rare that Google Translate does a better job with a sentence than I do. In this case I only used the Google Translation for one word. One of the original lines in Spanish was “Si hay personas que estan leyendo los twitts mediante el celular y llegan a ver el auto, este podría ser recuperado gracias a twitter.” The literal translation of ‘recuperar’ is recuperate. Like, ‘recuperar las fuerzas.’ But recuperating a car doesn’t sound right and I couldn’t think of the correct translation off the top of my head. So I looked at the Google Translation and found it – recover.

The other tool that every content translator needs is a dictionary. Even the most skilled and fluent translators always have a dictionary nearby. Personally, I use WordReference.com. Not only is it a pretty great dictionary, but it also has a wonderful community (at least the Spanish-English community) of forum posters who discuss the various contexts of how phrases and words should be translated. If there is a word or phrase that you’re unsure about, you can post your question to the forum and it will likely be answered in a matter of hours if not minutes. I have set up a hot key trigger using Quicksilver so that with the press of just two keys, I am able to look up any word or phrase either from Spanish – English or English – Spanish.

The last tool I use is Instant Messenger. That’s right, if I’m unsure of a translation and can’t find it with WordReference.com or Google, then I message a friend.

Images, Embedded Code, and Links

In total, it took me 7 minutes to translate Daniel’s post. One of the reasons it was so fast and easy is because the post doesn’t contain any images or embedded code. Translating text in images is almost always impossible because you need the original file which created the image, not just the .gif or .jpg.

For example, you’ll notice that on the Bulgarian WordPress site, nearly all the text has been translated, but the image still says “Download Version 2.3.1″ because it is so much more difficult and time-consuming to translate text within image files.

It is also important to look at the original source code of the web content you are translating. For example, when you hover your mouse over these words, a dialogue box will pop up with a message. You don’t see that message in the published text, but it’s there in the source code. That text should also be translated. Similarly, when we publish image files on the web, we use the following HTML code:

<img src="graphics/with_cat.gif" width=75 height=75 alt="Me & my cat">

The text after the alt tag (ie. ‘me and my cat’) describes the photo. This allows users with slow internet connections to have some idea of what the image will reveal as it starts downloading. It’s also important for search engines which otherwise aren’t able to identify the content of an image. And, lastly, it’s important for the visually impaired who use software that reads the content of websites. By describing the content of the photograph, they’ll have a better idea of how it relates to the post.

A good translation should change “me and my cat” to “mi gato y yo”.

Then there’s the issue of links. Many bloggers will link to wikipedia or other reference sites as a way to give more context to what they are writing about. When I translate a post from Spanish to English, if they link to a Spanish Wikipedia article, most often I’ll change the link to the English Wikipedia article. Sometimes, however, the information in the two articles doesn’t exactly match up and so the significance of the reference is changed.

Translation Versus Contextualization

The issue of links brings us to the always difficult balance of faithful translations versus useful contextualization. Global Voices isn’t just focused on translating content from one language to another. We also realize that not all audiences are the same and that when a blogger is writing in Kuwait, he’s likely not thinking of someone in Peru who might be reading his post. So he likely will not write out common acronyms or explain cultural idiosyncrasies. This is where translation becomes an art – how much extra context do we offer as translators so that the text can be understood by a global audience just as clearly as a local audience?

In the case of Daniel’s post, I added two small descriptions that weren’t in the original text. First, I clarified that Las Condes is a neighborhood. (Otherwise, readers could interpret it as a shopping mall, a city, a building, just about anything.) However, I stopped short of describing it as the wealthy neighborhood of Santiago even though that’s the first thing that would come to the mind of many Santiago readers. I also translated Carabineros as ‘federal police’ because most people outside of Chile don’t know what the Carabineros are. My other option would have been to leave it as ‘Carabineros’ with a link to the wikipedia entry. But this goes to show that translators must not only know the language of the work they are translating from, but also the culture of where it was written from.

Not all languages are as well documented as Spanish and English

As an amateur translator of Spanish and English, I’m very fortunate. Not only are the two languages reasonably similar, they are also very well-documented and there is a lot of online support. Try translating something from Aymara to Chinese, though – you’re pretty much left on your own. (Which is why, for many of our GV translators use English as a brokerage language – ie. Aymara to English and English to Chinese).

While I use Google Translate and WordReference.com to do the majority of my translation work, those two services aren’t available in all languages. So here’s how a few other GV translators do their work:

Rezwan: Rezwan translates content from Global Voices into Bangla/Bengali. He uses two online Bengali – English dictionaries to help him out: Bangladict.org and Samsad Bengali-English Dictionary. There is also the open source Anubadok Online dictionary to translate from English to Bengali. And there is another open source tool that he uses to make sure that all of the text is encoded with Unicode so that it shows up in all browsers. Obviously, there are quite a few more steps to translate from English into Bengali than English into Spanish.

Lova: Lova translates Global Voices content into Malagasy. He does it nearly all by hand, but uses the Encyclopedia of Madagascar and Malagasy Dictionary to double-check certain words.

Hanako: Hanako says she never uses Google’s English – Japanese translator because it’s useless for anything other than comic relief. Instead, she translates by hand and double-checks words she’s unsure about with the ALC dictionary and Jim Breen’s WWWJDIC.

Chinese: The Chinese team of translators tend not to use any online dictionaries or translators, but they work collaboratively on all of the translations. For each translation, they create a wiki page where they can all offer suggestions and improvements on the translation.

As you can see, each translator has his or her own way of translating and the toolsets used by each varies greatly. Because it’s difficult to find the best tools for each situation, the barrier to entry for volunteer translation is still high. One of the goals of the Open Translation Tools 2007 conference is to lower that barrier.

Next post we’ll look at the processes and obstacles for translators of open source software and website interfaces.

Leave a Reply