Felix glossaries compiled from Wiktionary

I’ve just added 1,388 new glossaries from 43 language pairs, compiled from the Wiktionary project.

Wiktionary is a community-contributed dictionary site that is a spin-off of Wikipedia. There are hundreds of langauges on Wiktionary, but I narrowed this down to 43 using this list of the 50 most widely spoken languages in the world.

The glossaries were compiled from a site snapshot taken on November 12, 2008. I scanned through the XML site download, created lists of all translation pairs, and then compiled Felix glossaries from them.

Wiktionary is licensed under the GNU Free Documentation License, and so are the Felix glossaries compiled from it.

Using Microsoft Excel as a glossary-conversion tool

As translators, we get glossaries in all sorts of formats: XML, HTML, tab-delimited text, comma-separated value (CSV), …

A good example is the Microsoft terminology glossary: a monstrous CSV file of terminology used for localizing Microsoft user interafaces.

We often need to convert these glossaries into other formats, especially to get them into a terminology management program. Microsoft Excel is actually a great tool for doing this. It can open all the formats listed above, and more. Using Felix, you could then import the glossary directly, or if you’re using some other tool, you could save the glossary in many popular formats, such as tab-delimited text or csv; chances are your terminology manager will support one of them.

Another cool trick with Excel is loading glossaries from the Internet. When Excel is installed, the context menu in Internet Explorer gets an “Export to Microsoft Excel” command; so when you have a glossary in a table on a website, you can simply right click on it, export it to Excel, and from there put it into any of a number of formats.

Of course, there are limitations to using Excel as an intermediary for glossary conversion. The main one is when terminology managers use special formats, which Excel can’t interpret in a meaningful way. In this case, you can often get around it by using one of the generic “save as” file options of your terminology manager.


EDICT dictionary files available as Felix glossaries

I’ve converted the EDICT and ENAMDICT dictionary files created by Jim Breen into Felix format. The converted glossary files are available from the Felix Website.

The EDICT file is multilingual (Japanese/English/French/German/Russian), and I’ve converted it into 20 Felix glossaries representing each language combination. Of course, since Japanese is the central language, language pairs that don’t have Japanese as the source or translation language may be less useful.

The ENAMDICT file is a dictionary of proper names. All together the file was humungous, so I broke it into several smaller glossaries by category (personal names, place names, organizations, and so on). Of course, you’re free to load them all up into your Felix glossary window, since the number of glossaries you can have open is only limited by how much memory your computer has.