|
Background
Written language, as a communication mean should be as much as
possible clear and concise. Because natural language has no limit,
linguistic engineering should be contained within the limits defined by
the machine.
The idea underlying the simplification of the natural language is not
new. In the past it has been envisaged for the communication of complex
concepts with the help of simple expressions. More recently, controlled
language concept has been developed for the purpose of Computational
Linguistic.
Fast and coherent Machine Translation is very desirable, but present
technology does not allow it. Natural language brings a lot of
ambiguities, particularly in the domain of translation.
There is a need to define rules accordingly technological capacity of
the machine translation systems, in order to get acceptable translation.
These rules define a controlled language. The approach results of the
choice to adapt language to machine performances and not the reverse
which is impossible today. This allows an efficient communication,
thanks to the "right word" and eliminates non-necessary information.
Controlled language is a set of rules, which guides writers. These
rules are integrated with authoring tools. Authoring tools brings to
writer's information on syntax, semantic and structure of documents.
Also the Authoring tools are often integrated with the documentation
chain for the creation of text to be translated by the machine. The tool
brings useful indication for the preparation of a source text in order
to.
- Avoid any semantic ambiguities - Simplify the syntax -
Suppress partially or totally post edition
Authoring tools should be adapted, in particular, to the choice
concerning vocabulary. Some lexicon are determined in function of the
domain or an organization (medicine, architecture,
aeronautic...)
Industry requirements
The use of Authoring tools is requested by the industry in domain
such the technical documentation, user manual, maintenance manuals.
Several organizations like AECMA (Association Europ¨¦enne de
Constructeurs de Mat¨¦riel A¨¦rospatiales) of GIFAS (Groupement des
Industries Française A¨¦ronautiques et Spatiales) have published Writers
Guides or specification of controlled language in order to facilitate
the reading of technical documentation and translation.
In fact the use of controlled language insures a coherent
communication inside a company, and is very efficient in particular with
domain like security.
In machine translation, the use of authoring tools based on
controlled language allows a substantial reduction of the post edition
phase. Faced to the explosion of language in Europe, this process is
very useful. The creation of documentation has changed; more and more
documents need a translation in other languages.
In the scope of written communication, the controlled language
concept is paradoxical. Though it represent a simplification of the
language it's bring more powerful expression, due to inherent precision
and concision.
Lingaware goals
Lingaware is developing application in the domain of controlled
language. Goals are to adapt emerging language engineering technologies
to present needs.
Controlled language tools developed by Lingaware have the following
advantage:
- They are modular and can be set-up to customer's requirements.
Based on Typed Features Logic formalism and statistical processing
(embedded with DicoBase?, Lingaware's tools can be easily adapted to
different environments. Also it is very easy to adapt the rules used for
controlled language to suit users requirements. - The controlled
language tools could be interfaced with major documentation editing
tools (OS2, UNIX, and Windows) and are easy to manipulate. - They are
interactive; users can start analysis on request and control output
during edition. Dictionaries can be updated with specific term, manually
or from the terminological database management system DicoBase? - In
the context of machine translation, they can drastically reduce the task
of post-editing.
Principal features of Controlled Languages.
The rules defining controlled language are integrated in authoring
tools. The analyzer part detects incomprehensible structures for machine
translation and suggests possible solution.
The rules are relevant with the structure of the document, syntax,
semantic and the lexicon.
Dictionaries
- Selected terms dictionary - Approved list of technical
terms - List of words adapted to the production
process
Terminological choice
- Use selected word in the right context - Keep initial
meaning - Only use approved derivations of verbs - Clear
identification of technical words. - A technical word can only be
used as a substantive. - Use official word as far as possible. -
Do not use different technical words for the same thing. - If you
have choice, use the simplest word - Be specific when writing
text.
Grouping terms
- Avoid nominal group composed by more than 3 terms. - If possible
use a substantive with article or demonstrative adjective. -
Verbs. - Use the following tenses : present, future, perfect - Use
preterit after the verb be or after a list of authorized
verbs.
Active voice
- Use active voice preferably to a passive voice - Avoid future
tense id it can be replaced by the present tense.
Sentence length
- Write short sentences: maximum 20 words
Short sentences
- On subject by sentence - Use coordinate conjunction - Use
enumeration type - Adapt length to structure of the
sentence
Instructions
- Only one instruction per sentence - Imperative mood - Write
enumeration - More instructions per sentence only if they describe
simultaneous actions. - If the instructions start with a description,
use a comma to separate from the rest of the sentence.
Description
- Use paragraph to show text articulation - One idea per
paragraph - Connect first sentence of a paragraph to the rest with
the help of an introduction sentence. - Maximum length of a
paragraph: 6 sentences - Introduce slowly complex data or new
information
Warning messages
- Start with clear and concise instruction - Give a short
explanation to give a clear idea of potential risk
Punctuation
- Double dot and hyphenation are counted as words in sentence
length - Use double dot and dash for enumeration - Use hyphen for
the reunion of words - In compound technical words use hyphen to show
relation - Text between parenthesis are counted as a new
sentence
Conclusion
Controlled language tools are for Europe a solution for developing
international communication. They provide a an advanced solution for
transferring idea from one language to another.
Annex 2 give an
example of controlled language tools applied to the translation of a
document originated by the European Parliament, with the help of MAXit Editor.
Quality of translation can be compared for:
- Original text in English - French translation by human -
Language controlled text in English - Machine translation in French
with the Systran machine - Machine translation in German with the
Systran machine
References
Simplified English, AECMA/AIA 1989 and
following Internationalization Localization of the Offer BULL SA ILO
group 1992 Learning to use Simplified English, a preliminary study,
University of Central Florida 1992 Simplified English and Machine
Translation. Peter J. Pym. Professional Translator & Interpreter N?2
1991
|