Categories
Other tools release translation memory

Version 0.2 of XLIFF Translator released

I’ve just released version 0.2 of XLIFF Translator.

You can download the latest version here.

This is still beta software, so please use for evaluation only.

Here are the main improvements to this version:

  • Support for opening single XLIFF files

    When you open an XLIFF file, the file’s directory is opened as the current project, and the XLIFF file is selected in the translation view.

  • Wide range of XLIFF files passing test suite.

    An XLIFF Translator user kindly supplied me with a variety of XLIFF files, and XLIFF Translator can handle all of them.

  • Japanese localization finished.

    There were a few strings that didn’t have Japanese localizations. I’ve added those translations in this version.

Thanks to user ysavourel who provided great feedback in the Felix forums.

About XLIFF Translator

XLIFF Translator is a free (MIT license) Windows desktop application for translating XLIFF files. XLIFF is a standard XML format for translation.

Categories
Felix translation memory

Stand-alone versus embedded CAT tools: trade-offs

Computer assisted translation (CAT) tools need to provide an editor in which to perform the actual translation. There are basically two ways to accomplish this:

  1. Providing a stand-alone editor
  2. Providing a plug-in (add-in) to an existing editor

Both approaches are used by various CAT tools. Felix actually has both: Felix itself comes with interfaces for MS Office, and Tag Assist is a stand-alone editor for HTML.

Each approach has its strengths and weaknesses.

Providing a stand-alone editor

Stand-alone editors have two main advantages. Firstly, they’re more stable because they don’t have to worry about maintaining compatibility with multiple versions of the editor they’re plugging into. Secondly, since they can design the editor environment specifically for translation, they can make a smoother workflow.

The main weakness of the stand-alone approach is that you’ve got to re-implement all that word/document-processing functionality. Things like spell checking, word counts, and rich formatting take a lot of time to implement, and you would basically them get for free by piggy-backing on an existing editor.

Another weakness of the stand-alone approach is support for document formats. With MS Office in particular (pre-2007), a closed, binary format was used, making perfect conversion between the MS Office format and the stand-alone editor quite difficult. Even the Macintosh and Windows versions of MS Word have notorious interoperability issues.

Providing a plug-in

Using a plug-in approach with an existing editor also has two major advantages. The biggie is that you can leverage all the non-trivial work that has gone into developing that editor. Office is Microsoft’s killer app, and several lifetimes worth of programmer hours have gone into developing it.

Secondly, this approach allows you to use the same file formats as your client. When using a stand-alone editor, there’s generally some sort of filter process going on, first to import the Office document into the editor environment, and then to put the translation back into the original format. When you’re dealing with very simple documents this isn’t usually a problem, but things soon break down. This may be alleviated somewhat in the future, as Microsoft Office moves to its new, open, XML-based format. Time will tell; but even so, expect a few years at least until most consumers of translation move away from the older Office formats.

Another benefit of the integrated approach is user familiarity: if your users are used to MS Word, then having your tool integrated into MS Word should have a shallower learning curve than making them learn a new editor.

The biggest weakness of the integrated approach is that you’ve got to support a foreign interface, usually across multiple versions. This greatly multiplies the failure points of the software. One example of this in the case of Felix is PowerPoint 2007. The first Felix interface from PowerPoint was developed for PowerPoint 2000, and it worked fine with PowerPoint XP and 2003. But when PowerPoint 2007 was released, a change in the code caused PowerPoint to melt down and crash hard (requiring a reinstall) if the Felix add-in was installed. I scrambled and patched the interface as quickly as possible, but due to problems with my then-distributor, it took several months for the new version to be released, and during that time I had to advise users to not use Felix with PowerPoint 2007.

With the stand-alone approach, you don’t have these compatibility issues.

Conclusion

Choosing a stand-alone versus an integrated interface is about making trade-offs. With the stand-alone approach you get greater stability and customization at the expense of feature richness, while with the integrated approach you get a rich feature set, document compatibility, and user familiarity at the expense of greater fragility.

Categories
translation memory

Productivity gains from translation memory

There’s no doubt that under the right circumstances, translation memory can give you huge productivity gains. To give one example, I’ve had many users report that with the right text, a Felix license can pay for itself in a day or two.

So what is the “right” kind of text? To get the greatest productivity gain from translation memory, the text should:

  1. Be repetitive, and
  2. Have sentences that are relatively independent of context

Repetitive

The text should be repetitive so that you can recycle lots of translated segments — this is the big productivity win of translation memory. An example would be translating a product manual, then the manual for a new model of the same product the next year, with very little of the text changed.

Independent of context

If the same word or sentence needs to be translated differently in in Englisdifferent contexts, it’s going to slow you down. For example, the Japanese word マイコン (maikon) can be variously translated as “microcontroller,” “microprocessor,” or “microcomputer” in English. The need to determine which translation to use each time is more time consuming than when the term or sentence can generally take the same translation. And taking more time to complete the translation means lower productivity.

What if the text isn’t repetitive?

If the text isn’t repetitive or is highly context dependent, then you can still benefit from translation memory. Translation memory can improve consistency through terminology and concordance features. It can also help you avoid missing whole phrases or sentences in your translation, because you’re generally overwriting the original, and can refer to it as you do your translation.

But in my experience, translation memory isn’t going to help you translate much faster in this case. As the developer of a CAT tool, you might think it would behoove me to claim otherwise. But not only would that not be true, as a translator I believe it’s actually counterproductive. Some unscrupulous vendors of CAT tools make unrealistic claims of improved productivity (and hence reduced costs) to translation purchasers, who then turn around and place unrealistic expectations on us translators.

Avoid getting burned

I’ve heard a few stories of translators getting started with TM, providing a steep discount on their first job, and later finding that the tool didn’t help their productivity at all, or actually slowed them down. So they were now out the $1,000 or more that they paid for the tool, as well as the huge discount they provided to the client.

So while translation memory can give tremendous productivity benefits, it’s important to be realistic about how much they can do. If you’re new to translation memory or are considering moving to a new tool, I highly recommend trying out the trial version of your tool of choice and verifying for yourself just what kinds of gains TM can give you.

Categories
translation memory

Three types of translation memory search

When you’re relatively new to computer aided translation (CAT), the terminology can get a little confusing. In this post I want to describe the three main types of search used by translation memory managers (TMMs): memory search, glossary search, and concordance search.

Memory Search

This is the bread-and-butter feature of TMMs. Remember that a “translation memory” is basically just a database, where each entry is a string of source text, paired with its corresponding translation. The translation memory (TM) might have some other information, like who created each entry, how reliable the translation is, and so on, but the source-translation pair is the only essential part.

A memory search compares the sentence (or segment of text) that you’re currently translating against each source segment in your TM. If the TMM finds a match, then it displays the corresponding translation so you can insert it into your text without doing the translation again. This is the key function of a translation memory tool.

There are two kinds of matches that the TMM might find. The first is a perfect match: this is when there’s a source segment in the TM that’s identical to the sentence you’re currently translating. The other type of match is a fuzzy match: this is when there’s a source segment that’s similar but not identical to the sentence you’re translating. In either case, you need to decide whether to use the suggested translation (editing as needed), or ignore it and translate the sentence from scratch. Even with a perfect match, the translation might not fit in the current context; you’ve got to decide on a case-by-case basis. Which means that unfortunately, you can’t disconnect your brain when using translation memory.

Glossary Search

Another name for this is “terminology search.” Most TMMs these days have a glossary search function, although it might be an add-on with some. A glossary search matches source terms in a glossary against the sentence you’re translating; if it finds a matching term, the TMM suggests the term’s translation to you.

For example, if you’re translating the sentence “I like ice cream,” and your glossary has the entry [“ice cream” = “helado”], then your TMM will suggest the term “helado” to you.

It’s also possible to use fuzzy matching with glossary searches, although this isn’t as common a feature as with memory searches (Felix does have it).

Concordance Search

A concordance search (also called “context search”) is where you search your TM for occurrences of a particular term. They’re kind of the opposite of glossary searches: with a glossary search, the TMM is scanning the sentence you’re translating for matches in your glossary. With a concordance search, the TMM is scanning your translation memory for matches with the term you supply.

This feature can be really useful if you know you’ve translated a certain term before, but no matches are appearing in your memory search, and the term isn’t in your glossary. You can use a concordance search to find all the instances where you translated it before, the idea being that this should help you translate the term this time (and since you took the trouble to do a concordance search, maybe stick it into your glossary for the next time around).

For example, say you’re translating the sentence “I love ham for breakfast.” You don’t have any matches, but you know you’ve translated “ham” before (but can’t remember how). You could do a concordance search for “ham,” and might get matches like:

  • We roast a ham every Christmas.
  • That actor is a real ham.

(Which shows that it’s important to take the term’s context into account.) You could presumably then look at the translation for the first sentence, and see how you translated the edible type of ham in the past.


Those are the three main types of search performed by translation memory software. Now the next time you see these terms mentioned, you’ll know what they refer to. You can also see the Retrieval section of the Wikipedia article on translation memory.

Categories
translation memory

Charge for 100% matches

Translation memory can certainly improve productivity. Many purchasers of translation are of course aware of this, and want some of these productivity benefits passed on to them. This usually takes the form of offering discounts for sentences/segments already in the translation memory, and sometimes for fuzzy matches.

But this can be taken too far. Some clients will ask not to pay for 100% matches or repetitions at all.

On the face of it, it sounds reasonable — it’s already in the translation memory, so why should they pay for it twice? But there are two problems with this. First, it assumes that inserting the 100% matches/repetitions into the translation involves no work by the translator. This isn’t true; even 100% matches need to be checked for context in each situation.

To give a very simple example, Japanese doesn’t have a capital/lower case distinction. So if the same Japanese sentence is used in a title and the body of a section, you need two different English translations for it. One will be in Title Caps and follow English conventions for titles (e.g. leaving out articles), and one will be in sentence caps and follow normal English grammar conventions.

Also, Japanese very frequently elides the subject and/or object of the sentence, and verbs lack conjugation for person and number. On this basis alone, the same sentence can have many different translations depending on the context. He/she/it/they/we [will] put/puts it/them/him/her/us/the widgets on the list/in the box/over there…

The other problem stems from this dependence on context. Because context is so important to a translation, especially between languages like Japanese and English that are very different syntactically, you’ve got to pay a lot of attention to those 100% matches just to keep up with the context. Furthermore, if you’re using a translation memory created by someone else you have to pay even more attention in order to conform your translation style to the memory; otherwise, you’re liable to end up with some unreadable Frankensteinian hodgepodge of different styles and terminology.

It also follows from the above that simply leaving out the 100% matches, and sending you only the “new” parts, is even worse. You’re then left with a disjointed list of sentences and no idea of how the sentences fit together.

So sure, offer a discount for 100% matches. But think very carefully before offering to insert those 10,000 words of perfect matches for no charge.

Categories
translation memory

Translation memory with non-repetitive texts

Often people who translate texts that aren’t very repetitive will wonder if they can really benefit from using translation memory. Of course, since I market my own translation memory program, and use it myself whenever possible, I’m just a tad biased. Even so, I don’t assume that translation memory is the right match for everybody. In this post, I want to explore who can benefit from using it.

At a minimum, your text has to be in electronic format. If the text to be translated is in paper format or a scanned image, it still might be worthwhile to convert it to electronic format (e.g. using OCR). Even if you don’t use translation memory, it will make future searching easier.

Naturally, the more repetitive your text is, the more useful translation memory will be. I’ve seen many cases where a single job would give enough of a productivity boost to more than pay for a Felix license.

But even if the text isn’t very repetitive, there are other benefits of translation memory, assuming your text is in electronic format:

  • Concordance searches
  • Avoid missing entire phrases/sentences in your translation
  • Automatic glossary lookup and management
  • Easier review

Let me go into each of these benefits in detail.

Concordance searches

A concordance search is used to find words or phrases in your translation memory (and their corresponding translation/source). This is useful to find out how you translated a certain term in the past. For example, say you’re dealing with a tricky phrase, and you’re pretty sure you’ve translated it before. You could use a concordance search to find all the places in your translation memory where you’ve translated that phrase in the past. You could then use one of your prior translations, or use it to brainstorm a new one.

Incidentally, Felix allows concordance searches for both source and translation, but some other tools apparently only allow them for the source. To get concordance for a translation, select the text in the Felix memory window, and press Ctrl + Alt + C (Alt + C for source concordance).

Avoid missing entire phrases/sentences in your translation

Dropped phrases, and even entire sentences and paragraphs, are the bane of the translator. Japanese has a rather charming term — 訳漏れ, or “translation leaks” — to refer to this pernicious problem. The problem with translation leaks is that our eyes tend to jump over them when we review our translation. A careful review will catch them, but it would be nice to avoid them in the first place.

Since translation memory is generally used by translating each segment (e.g. sentence) in turn (Felix does this by overwriting the source file), it’s much less likely that you’ll miss out entire sentences or paragraphs. Of course, the problem of missing phrases is still there, especially with very long sentences (or translating several sentences as a single unit). One trick I use to avoid missing phrases is the register glossary entries feature. When I register parts of the source and translation as glossary entries, I can pretty quickly spot when there are missing bits. As an added bonus, I build up my glossary at the same time.

Automatic glossary lookup and management

Here’s an area where you can benefit even if your text doesn’t contain a lot of repetition. By importing your glossaries into your translation memory tool, and creating your own glossaries, you can automatically look up the glossary matches every time you translate a sentence. This is especially useful when your client gives you a massive terminology list that they want you to follow.

Here’s an example of where this feature can help out. I was doing a translation that included a lot of Chinese place names. I’m pretty bad at reading all but the most common of these names, but I found a page on the Internet with the Japanese and English names of all Chinese provinces and many of its cities. I used the handy Internet Explorer feature to dump this data into MS Excel, and added that glossary to Felix from Excel. Then when I translated the document, every place name was looked up for me automatically.

Easier review

With a review mode, it’s much easier to check each translation against its source segment. Felix also performs a glossary lookup, so you can make sure you’re using glossary terms correctly/consistently.

Conclusion

As I’ve described above, there are several benefits of translation memory even if the text to translate isn’t very repetitive. It remains to be seen, however, whether these benefits are worth the cost of a commercial system. That’s something that every individual translator will have to answer for him or herself. Even if your work is mostly non-repetitive, however, I recommend trying out translation memory and seeing if it works for you. Most of the commercial translation memory systems have trial versions, and there are free programs available as well.