Archive for June, 2009

The power of automatic saving

Jun. 27th 2009

I’m working on the next version of Memory Serves now, so I’m doing a lot of dogfooding with it. I’ve thus been using Memory Serves pretty much exclusively over the past month for my own translation work.

Over the course of using Memory Serves intensively, I’ve uncovered quite a few areas needing improvement; which is good, because knowing about the problems makes it possible to fix them. 🙂

One feature that really saved my bacon, however, was the fact that Memory Serves keeps the database up-to-date at all times. I had been working on a fairly large translation, and went out with my family for dinner. Okinawa was experiencing some intense electrical storms, and when we got back, I found that my neighborhood had had a blackout, and my computers had all shut down.

Since Memory Serves uses the SQLite database to store the translation memories, all changes to the TMs are saved to disk immediately. So none of my work was lost, and I was able to carry on translating.

With Felix, your TMs aren’t saved automatically; you have to save them much as you would a Word document. Although it will prompt you to save if you exit the program with unsaved changes, if your computer (or Felix) crashed, then you’d lose all the translation entries you’d made since your last save.

This happened to a Felix user a few months back: she had been working on a translation for about six hours when her computer crashed, and she hadn’t saved her TM even once. She asked me if there was some way to recover her translations, but the only way was to use Align Assist to recreate her translation memory — the original TM was lost.

I added a ticket to my Felix issue tracker to add automatic background saving of TMs, but until now I’ve given higher priority to other development. Seeing first hand how this feature saved my own bacon with Memory Serves, however, I’ve decided to give it higher priority for Felix as well. I hope to have it included in Felix by the next release (version 1.5), or at the latest by the version after that (1.5.1).

The next version of Memory Serves will be released over the next few days, and it’ll have a lot of improvements as well. In particular, it’s much faster, fixes some issues with correcting/editing translations, and will have a new search and replace feature. The new search and replace will serve as a prototype of the improved search and replace I’m adding to Felix.

Posted by Ryan Ginstrom | in Felix, Memory Serves | No Comments »

How glossary matching works in Felix

Jun. 16th 2009

Built-in glossary searching is one of the key features of Felix. In this post, I want to describe how the glossary searching algorithm works, and how results are displayed.

Finding Matches

Felix has two choices for glossary searches. If you choose a minimum score of below 100% (Tools >> Preferences >> Glossary >> “Minimum fuzzy score”), then Felix will do fuzzy matching based on the Levenshtein (edit) distance.

If you select a score of 100%, it will only count perfect matches.

To give an idea of what this means, consider this glossary entry:


Now, say you’re translating this sentence:

Put the aaCaa in the box.

If you’re not using fuzzy matching, then no match will be found for aaCaa. If you set the fuzzy threshold to around 80%, then this will be retrieved as a candidate.

You can also set whether to ignore case, wide/narrow character distinctions, and distinctions between Hiragana and Katakana.

Case: “aaa” is the same as “AAA”
Wide/narrow: “123” is the same as “123”
Hiragana/Katakana: “いろは” is the same as “イロハ”

Displaying Results

All the glossary matches for the current sentence are displayed in the glossary window. The matches are displayed by reference count, string length, and score. That is, the match with the highest reference count is shown first in the list of matches; if two matches have the same reference count, then the longer match goes first; and so on.

Reference count: The number of times the translation has been retrieved by the user
String length: How long the source word/phrase is
Score: If you use fuzzy glossary matching, how close the match is.

Room for Improvement

There are several ways in which the glossary matching algorithm could be improved. Felix user Steven Venti proposed a search algorithm that I would characterize as based on “closeness” or “stickiness,” and gave the program Jamming (Japanese) as an example of a program that does dictionary searches very well.

Another feature I’ve been thinking about for a while is the ability to create rule-based glossary entries, using wildcards or regular expressions. For example, you could do this to create translations for dates, or product names consisting of set patterns.

The way that matches are displayed can also be improved. I could make it possible for users to determine the sort criteria (what order matches are displayed in), both through preferences and dynamically. I’m also planning to make it possible to easily show and hide details about glossary matches — for example, click “details” to show all the information about the match, such as creator and date created, and “minimal” to show just the source and translation (thus allowing more matches to be shown at once).

In a way, being able to specify the order in which matches are displayed could make up for the “feast or famine” problem that Steven mentions: getting either too few or too many matches. If you set the match score low enough that you get lots of matches, but could arrange so that the matches you want are shown first, I think that would go a long way toward improving usability.

Posted by Ryan Ginstrom | in Felix | 3 Comments »

Tip: Getting word counts from Excel files

Jun. 2nd 2009

Getting word counts from Microsoft® Excel files is a common and frustrating task for translators and writers and general.

One common approach is to save the worksheet as a text file, then open that in Word and use the Word Count feature. This approach has some problems, though: you can only save one worksheet at a time, and text in text boxes isn’t saved, so you could end up with a word count that’s too low. Not to mention the time and hassle involved.

About a year ago, I did a huge translation that literally consisted of hundreds of Excel files and thousands of worksheets. Counting the words in each file using the MS Word method would have driven me batty.

If you use Windows and often need to get word counts from Excel files, I recommend my free program, Count Anything. Just click the “Count” button, drag and drop your Excel files into the dialog, and click OK.

Drag and drop Excel files into the dialog box

You’ll end up with a nicely formatted report that you can drill down on, print, or save as an HTML or text file.

Results of Excel file word count

Click here to download the free Count Anything program.

Posted by Ryan Ginstrom | in Felix | 16 Comments »
  • Search

  • Categories

  • Calendar

    June 2009
    M T W T F S S
    « May   Jul »
  • Pages

  • Meta