Felix resources

Tanaka Corpus available in Felix TM and TMX formats

I converted the Tanaka Corpus of aligned Japanese and English sentences into Felix translation memory (TM) and TMX formats.

The Tanaka Corpus is a collection of around 150,000 Japanese-English sentence translation pairs, compiled over several years by university students, with later cleanup and correction by Jim Breen and his colleagues.

Download the Felix/TMX versions of the Tanaka Corpus here.

Memory Serves release

Version 1.5 of Memory Serves released

I’ve just released version 1.5 of Memory Serves.

Download the latest version here.

Below is a list of the fixes and improvements in this version:

  • Runs as single instance: if you try to run Memory Serves and it is already running, it will open a web page to the Memory Serves site and exit. This prevents errors due to two instances of Memory Serves competing for the same port.
  • System tray icon. This lets you easily see if Memory Serves is running. You can also right click the icon to launch Memory Serves in your browser, or quit the server. Display of the system tray icon can be controlled in the preferences.
  • Rudimentary statistical information is displayed on the view memory/glossary page (% of records validated and reliability rating stats).
  • The memory footprint was reduced slightly.
  • Other minor tweaks to the view pages

About Memory Serves

Memory Serves is a free application that lets you share Felix translation memories (TMs) and glossaries over your local network or VPN.


Analyze Assist version 1.2 released

I’ve just released version 1.2 of Analyze Assist.

Download the latest version here.

Here are the main improvements in this release:

  • The new Felix translation memory (TM) extension (*.ftm) is supported.
  • You can now drag and drop files into the file lists on the Analyze wizard.
Felix resources

Felix glossaries compiled from Wiktionary

I’ve just added 1,388 new glossaries from 43 language pairs, compiled from the Wiktionary project.

Go to Felix Wiktionary glossaries page

Wiktionary is a community-contributed dictionary site that is a spin-off of Wikipedia. There are hundreds of langauges on Wiktionary, but I narrowed this down to 43 using this list of the 50 most widely spoken languages in the world.

The glossaries were compiled from a site snapshot taken on November 12, 2008. I scanned through the XML site download, created lists of all translation pairs, and then compiled Felix glossaries from them.

Wiktionary is licensed under the GNU Free Documentation License, and so are the Felix glossaries compiled from it.

Felix translation memory

Stand-alone versus embedded CAT tools: trade-offs

Computer assisted translation (CAT) tools need to provide an editor in which to perform the actual translation. There are basically two ways to accomplish this:

  1. Providing a stand-alone editor
  2. Providing a plug-in (add-in) to an existing editor

Both approaches are used by various CAT tools. Felix actually has both: Felix itself comes with interfaces for MS Office, and Tag Assist is a stand-alone editor for HTML.

Each approach has its strengths and weaknesses.

Providing a stand-alone editor

Stand-alone editors have two main advantages. Firstly, they’re more stable because they don’t have to worry about maintaining compatibility with multiple versions of the editor they’re plugging into. Secondly, since they can design the editor environment specifically for translation, they can make a smoother workflow.

The main weakness of the stand-alone approach is that you’ve got to re-implement all that word/document-processing functionality. Things like spell checking, word counts, and rich formatting take a lot of time to implement, and you would basically them get for free by piggy-backing on an existing editor.

Another weakness of the stand-alone approach is support for document formats. With MS Office in particular (pre-2007), a closed, binary format was used, making perfect conversion between the MS Office format and the stand-alone editor quite difficult. Even the Macintosh and Windows versions of MS Word have notorious interoperability issues.

Providing a plug-in

Using a plug-in approach with an existing editor also has two major advantages. The biggie is that you can leverage all the non-trivial work that has gone into developing that editor. Office is Microsoft’s killer app, and several lifetimes worth of programmer hours have gone into developing it.

Secondly, this approach allows you to use the same file formats as your client. When using a stand-alone editor, there’s generally some sort of filter process going on, first to import the Office document into the editor environment, and then to put the translation back into the original format. When you’re dealing with very simple documents this isn’t usually a problem, but things soon break down. This may be alleviated somewhat in the future, as Microsoft Office moves to its new, open, XML-based format. Time will tell; but even so, expect a few years at least until most consumers of translation move away from the older Office formats.

Another benefit of the integrated approach is user familiarity: if your users are used to MS Word, then having your tool integrated into MS Word should have a shallower learning curve than making them learn a new editor.

The biggest weakness of the integrated approach is that you’ve got to support a foreign interface, usually across multiple versions. This greatly multiplies the failure points of the software. One example of this in the case of Felix is PowerPoint 2007. The first Felix interface from PowerPoint was developed for PowerPoint 2000, and it worked fine with PowerPoint XP and 2003. But when PowerPoint 2007 was released, a change in the code caused PowerPoint to melt down and crash hard (requiring a reinstall) if the Felix add-in was installed. I scrambled and patched the interface as quickly as possible, but due to problems with my then-distributor, it took several months for the new version to be released, and during that time I had to advise users to not use Felix with PowerPoint 2007.

With the stand-alone approach, you don’t have these compatibility issues.


Choosing a stand-alone versus an integrated interface is about making trade-offs. With the stand-alone approach you get greater stability and customization at the expense of feature richness, while with the integrated approach you get a rich feature set, document compatibility, and user familiarity at the expense of greater fragility.

translation memory

Productivity gains from translation memory

There’s no doubt that under the right circumstances, translation memory can give you huge productivity gains. To give one example, I’ve had many users report that with the right text, a Felix license can pay for itself in a day or two.

So what is the “right” kind of text? To get the greatest productivity gain from translation memory, the text should:

  1. Be repetitive, and
  2. Have sentences that are relatively independent of context


The text should be repetitive so that you can recycle lots of translated segments — this is the big productivity win of translation memory. An example would be translating a product manual, then the manual for a new model of the same product the next year, with very little of the text changed.

Independent of context

If the same word or sentence needs to be translated differently in in Englisdifferent contexts, it’s going to slow you down. For example, the Japanese word マイコン (maikon) can be variously translated as “microcontroller,” “microprocessor,” or “microcomputer” in English. The need to determine which translation to use each time is more time consuming than when the term or sentence can generally take the same translation. And taking more time to complete the translation means lower productivity.

What if the text isn’t repetitive?

If the text isn’t repetitive or is highly context dependent, then you can still benefit from translation memory. Translation memory can improve consistency through terminology and concordance features. It can also help you avoid missing whole phrases or sentences in your translation, because you’re generally overwriting the original, and can refer to it as you do your translation.

But in my experience, translation memory isn’t going to help you translate much faster in this case. As the developer of a CAT tool, you might think it would behoove me to claim otherwise. But not only would that not be true, as a translator I believe it’s actually counterproductive. Some unscrupulous vendors of CAT tools make unrealistic claims of improved productivity (and hence reduced costs) to translation purchasers, who then turn around and place unrealistic expectations on us translators.

Avoid getting burned

I’ve heard a few stories of translators getting started with TM, providing a steep discount on their first job, and later finding that the tool didn’t help their productivity at all, or actually slowed them down. So they were now out the $1,000 or more that they paid for the tool, as well as the huge discount they provided to the client.

So while translation memory can give tremendous productivity benefits, it’s important to be realistic about how much they can do. If you’re new to translation memory or are considering moving to a new tool, I highly recommend trying out the trial version of your tool of choice and verifying for yourself just what kinds of gains TM can give you.


New Felix resource added: TM and glossary of legal terms (J-E)

I’ve converted the “Standard Bilingual Dictionary” into a Felix translation memory (TM) and glossary, and posted them to the Felix website:

Felix TM and glossary of Japanese-English legal terms

These should be of use to anyone who has to translate Japanese laws into English.

About the Standard Bilingual Dictionary

The Standard Bilingual Dictionary is a glossary of official translations of terms from Japanese law. It’s part of a major effort by the Japanese government to translate its laws into English.

Felix release

Felix version 1.3 released

Version 1.3 of Felix has been released. This release contains a large number of bug fixes and usability enhancements. Two of the main enhancements in this new version are Properties dialogs for the PowerPoint and Excel interfaces. I also took the opportunity to squash a bunch of small bugs that have been building up over the past few months.

You can see a full list of improvements here.

You can download the latest version of Felix here.

I’m planning the next release (version 1.4) for around mid-October. This will be the release that includes translation history files (similar to “bilingual files” in Trados-speak, but saved as separate files in a transparent format).

In between that time, I’m planning on adding improvements to three existing tools: Analyze Assist (analyzing files against TMs), Count Anything (which provides word and character counts for many different file types), and Align Assist (which is currently retired), as well as a first stab at a macro interface for Open Office Writer.

Felix misc

Comparison of Felix and Trados matching algorithms

The About Translation blog posted a while ago about quirks in the matching algorithm employed by Trados.

He posts the following match results:

Trados matches for “LEAD DESIGN -“

Source Fuzzy Score
Lead Design 67%

He points out that this seems unintuitive and less than useful.

Felix seems to do a better job at assigning scores:

Felix matches for “LEAD DESIGN -“

Source Fuzzy Score
Lead Design 85%

Fuzzy match results for LEAD DESIGN

Note that the scores may differ slightly depending on your Felix search settings. For example, you can set it to ignore case, wide/narrow characters, assign penalties for formatting mismatches, and the like. The scores above are with the “ignore case” setting. The part highlighted in red is the part that Felix recognizes as differing between the two strings.


Felix TM specifications

I believe in being open about Felix, because I think it lets user make informed decisions. That’s why I publish my development road map. I also recently published technical specifications for Felix, including translation memories (TMs), glossaries, and other features. Although it’s very hard to come up with a balanced comparison with other CAT tools, I also tried to do this here.