TRANSLATION MEMORY

Translators rarely translate completely new documents that are unrelated to any texts they have seen before. More often, a translator may recall examples of translations for an idiom or domain-specific expression but be unable to locate the source material to confirm her suspicions.

In these situations, a memory of good translations easily accessible by keyword searching would be an ideal aid to the translator. CRL's ``Translation Memory'' tool provides this capability. In most Translation Memory schemes, however, getting examples into the database can be difficult. CRL's, ``XAlign'' provides the ability to automatically pair sentences or passages from translated documents with high accuracy. Translations can then be stored in Translation Memory directly from XAlign, available for immediate searching for example translations.

Operation

XAlign and Translation Memory are separate windows that work together. The first step is getting translations into Translation Memory. Once translations are in, you can then search for examples of past usages and quickly scan the examples for the most appropriate ones.

You use XAlign to get translated texts from the Tipster Document Manager. Texts are displayed side-by-side, applying a segmentation strategy to chunk the texts according to a user-specified scheme. This may be by punctuation, or it may be by SGML or HTML markup. After segmentation, you can perform automatic alignment of the segments. This is not foolproof, but it is often very helpful in getting an initial pairing of translated segments. You can then manually change incorrect pairings and send the results to Translation Memory to be stored in an existing or new database.

Highlights

Import/Export of Tipster documents.
User-customizable segmentation of documents.
XAlign remembers user-specified segmentation schemes.
Automatic parallel text alignment using clues from sentence length, character 4-gram comparisons and heuristics about how matches can be made.
Full editing of texts, including manual modification of alignments.
Documents can be saved as pairs back to the document manager for continued work at a later time.
Adds translation segments to new or existing Translation Memory, either a whole document at a time or segment by segment.

After you have created a Translation Memory database, you can then use the TM tool to search for examples. Some features of TM are:

Multilingual search capabilities.
Multiple TM databases.
Search of both the source or target languages of the translations.
Direct additions to TM databases from XAlign tool.
Display of ranked list of relevant example texts.
Display of ``snapshots'' of ranked examples for easy skimming.
Fuzzy matching of search and text terms to capture cognates and morphological variants.
Fuzzy highlighting of found terms, including cognates in the parallel text.

In combination, XAlign and Translation Memory provide you with the tools to manage translations and make them available for future use.

Configuration

XAlign segmentation schemes can be designed by the user to meet specific segmentation needs. A segmentation scheme for HTML documents that splits-up documents based on HTML markup may not be appropriate for free-text, for example, and sentence-splitting punctuation may not be the same between languages. In Xalign, the segmentation schemes are transparently saved and available to each user from session to session.

From within XAlign, you can also create new Translation Memory databases and can then add translations to the database. The databases are all managed by the CRL's NDS server possibly on a remote computer. This frees up processing and indexing of the translation texts from the local host computer. The location of the Translation Memory databases is specified in the NDS configuration file.

Status

XAlign and Translation Memory are integrated components of Oleada. They make use of the Tipster Document Manager (TDM) and Norm Data Server (NDS) for fully distributed text computing. To use XAlign and Translation Memory, you must import or create translated documents within Oleada. You can then load them into XAlign for alignment and save them to Translation Memory.

The automatic alignment algorithm used by XAlign was developed to cope with real-world documents, including documents with different markup schemes. At present, the algorithm is best suited for French and Spanish, although XAlign is fully multilingual and alignments can be prepared in any of the supported Oleada languages.

Oleada/Cíbola Home Page

Last Modified: 12:54pm MDT, July 25, 1996