How glossary matching works in Felix

16/06/09 12:16 PM

Built-in glossary searching is one of the key features of Felix. In this post, I want to describe how the glossary searching algorithm works, and how results are displayed.

Finding Matches

Felix has two choices for glossary searches. If you choose a minimum score of below 100% (Tools >> Preferences >> Glossary >> “Minimum fuzzy score”), then Felix will do fuzzy matching based on the Levenshtein (edit) distance.

If you select a score of 100%, it will only count perfect matches.

To give an idea of what this means, consider this glossary entry:

aaBaa

Now, say you’re translating this sentence:

Put the aaCaa in the box.

If you’re not using fuzzy matching, then no match will be found for aaCaa. If you set the fuzzy threshold to around 80%, then this will be retrieved as a candidate.

You can also set whether to ignore case, wide/narrow character distinctions, and distinctions between Hiragana and Katakana.

Ignore…
Case: “aaa” is the same as “AAA”
Wide/narrow: “123” is the same as “123”
Hiragana/Katakana: “いろは” is the same as “イロハ”

Displaying Results

All the glossary matches for the current sentence are displayed in the glossary window. The matches are displayed by reference count, string length, and score. That is, the match with the highest reference count is shown first in the list of matches; if two matches have the same reference count, then the longer match goes first; and so on.

Reference count: The number of times the translation has been retrieved by the user
String length: How long the source word/phrase is
Score: If you use fuzzy glossary matching, how close the match is.

Room for Improvement

There are several ways in which the glossary matching algorithm could be improved. Felix user Steven Venti proposed a search algorithm that I would characterize as based on “closeness” or “stickiness,” and gave the program Jamming (Japanese) as an example of a program that does dictionary searches very well.

Another feature I’ve been thinking about for a while is the ability to create rule-based glossary entries, using wildcards or regular expressions. For example, you could do this to create translations for dates, or product names consisting of set patterns.

The way that matches are displayed can also be improved. I could make it possible for users to determine the sort criteria (what order matches are displayed in), both through preferences and dynamically. I’m also planning to make it possible to easily show and hide details about glossary matches — for example, click “details” to show all the information about the match, such as creator and date created, and “minimal” to show just the source and translation (thus allowing more matches to be shown at once).

In a way, being able to specify the order in which matches are displayed could make up for the “feast or famine” problem that Steven mentions: getting either too few or too many matches. If you set the match score low enough that you get lots of matches, but could arrange so that the matches you want are shown first, I think that would go a long way toward improving usability.

Posted by Ryan Ginstrom | in Felix | 3 Comments »

3 Comments on “How glossary matching works in Felix”

  1. Gururaj Says:

    Ryan, thanks for these new features. I will have to try them out. I have been using WF these days mainly because of the very convenient on-the-fly registration of glossary entries, quick substitution, option of naming each glossary separately, and QC. Have similar features been incorporated in Felix too?

  2. Ryan Ginstrom Says:

    @Gururaj:

    You could always name each glossary separately.

    Registering glossary entries is fairly easy (here are the manual instructions), but I agree that WF is faster if a bit less flexible. I’m working on a way to add glossary entries very quickly.

    Felix doesn’t have any QC features, and although I have some planned, they’re three major releases out. The current release is 1.4.7 as of this writing;

    Version 1.5 – Improved search and replace
    Version 1.6 – Improved memory and glossary management
    Version 2.0 – Plugin system, which will include QC features among others.

  3. Gururaj Says:

    Great! I look forward with anticipation to the improved features. Especially the QC function. Keep up the good work!

Leave a Reply

  • Search

  • Categories

  • Calendar

    June 2009
    M T W T F S S
    « May   Jul »
    1234567
    891011121314
    15161718192021
    22232425262728
    2930  
  • Pages

  • Meta