Introduction to Controlled language

 

Background

Written language, as a communication mean should be as much as possible clear and concise. Because natural language has no limit, linguistic engineering should be contained within the limits defined by the machine.

The idea underlying the simplification of the natural language is not new. In the past it has been envisaged for the communication of complex concepts with the help of simple expressions. More recently, controlled language concept has been developed for the purpose of Computational Linguistic.

Fast and coherent Machine Translation is very desirable, but present technology does not allow it. Natural language brings a lot of ambiguities, particularly in the domain of translation.

There is a need to define rules accordingly technological capacity of the machine translation systems, in order to get acceptable translation. These rules define a controlled language. The approach results of the choice to adapt language to machine performances and not the reverse which is impossible today. This allows an efficient communication, thanks to the "right word" and eliminates non-necessary information.

Controlled language is a set of rules, which guides writers. These rules are integrated with authoring tools. Authoring tools brings to writer's information on syntax, semantic and structure of documents.

Also the Authoring tools are often integrated with the documentation chain for the creation of text to be translated by the machine. The tool brings useful indication for the preparation of a source text in order to.

- Avoid any semantic ambiguities
- Simplify the syntax
- Suppress partially or totally post edition

Authoring tools should be adapted, in particular, to the choice concerning vocabulary. Some lexicon are determined in function of the domain or an organization (medicine, architecture, aeronautic...)

Industry requirements

The use of Authoring tools is requested by the industry in domain such the technical documentation, user manual, maintenance manuals. Several organizations like AECMA (Association Europ¨¦enne de Constructeurs de Mat¨¦riel A¨¦rospatiales) of GIFAS (Groupement des Industries Française A¨¦ronautiques et Spatiales) have published Writers Guides or specification of controlled language in order to facilitate the reading of technical documentation and translation.

In fact the use of controlled language insures a coherent communication inside a company, and is very efficient in particular with domain like security.

In machine translation, the use of authoring tools based on controlled language allows a substantial reduction of the post edition phase. Faced to the explosion of language in Europe, this process is very useful. The creation of documentation has changed; more and more documents need a translation in other languages.

In the scope of written communication, the controlled language concept is paradoxical. Though it represent a simplification of the language it's bring more powerful expression, due to inherent precision and concision.

Lingaware goals

Lingaware is developing application in the domain of controlled language. Goals are to adapt emerging language engineering technologies to present needs.

Controlled language tools developed by Lingaware have the following advantage:

- They are modular and can be set-up to customer's requirements. Based on Typed Features Logic formalism and statistical processing (embedded with DicoBase?, Lingaware's tools can be easily adapted to different environments. Also it is very easy to adapt the rules used for controlled language to suit users requirements.
- The controlled language tools could be interfaced with major documentation editing tools (OS2, UNIX, and Windows) and are easy to manipulate.
- They are interactive; users can start analysis on request and control output during edition. Dictionaries can be updated with specific term, manually or from the terminological database management system DicoBase?
- In the context of machine translation, they can drastically reduce the task of post-editing.

Principal features of Controlled Languages.

The rules defining controlled language are integrated in authoring tools. The analyzer part detects incomprehensible structures for machine translation and suggests possible solution.

The rules are relevant with the structure of the document, syntax, semantic and the lexicon.

Dictionaries

- Selected terms dictionary
- Approved list of technical terms
- List of words adapted to the production process

Terminological choice

- Use selected word in the right context
- Keep initial meaning
- Only use approved derivations of verbs
- Clear identification of technical words.
- A technical word can only be used as a substantive.
- Use official word as far as possible.
- Do not use different technical words for the same thing.
- If you have choice, use the simplest word
- Be specific when writing text.

Grouping terms

- Avoid nominal group composed by more than 3 terms.
- If possible use a substantive with article or demonstrative adjective.
- Verbs.
- Use the following tenses : present, future, perfect
- Use preterit after the verb be or after a list of authorized verbs.

Active voice

- Use active voice preferably to a passive voice
- Avoid future tense id it can be replaced by the present tense.

Sentence length

- Write short sentences: maximum 20 words

Short sentences

- On subject by sentence
- Use coordinate conjunction
- Use enumeration type
- Adapt length to structure of the sentence

Instructions

- Only one instruction per sentence
- Imperative mood
- Write enumeration
- More instructions per sentence only if they describe simultaneous actions.
- If the instructions start with a description, use a comma to separate from the rest of the sentence.

Description

- Use paragraph to show text articulation
- One idea per paragraph
- Connect first sentence of a paragraph to the rest with the help of an introduction sentence.
- Maximum length of a paragraph: 6 sentences
- Introduce slowly complex data or new information

Warning messages

- Start with clear and concise instruction
- Give a short explanation to give a clear idea of potential risk

Punctuation

- Double dot and hyphenation are counted as words in sentence length
- Use double dot and dash for enumeration
- Use hyphen for the reunion of words
- In compound technical words use hyphen to show relation
- Text between parenthesis are counted as a new sentence

Conclusion

Controlled language tools are for Europe a solution for developing international communication. They provide a an advanced solution for transferring idea from one language to another.

Annex 2 give an example of controlled language tools applied to the translation of a document originated by the European Parliament, with the help of MAXit Editor.

Quality of translation can be compared for:

- Original text in English
- French translation by human
- Language controlled text in English
- Machine translation in French with the Systran machine
- Machine translation in German with the Systran machine

References

Simplified English, AECMA/AIA 1989 and following
Internationalization Localization of the Offer BULL SA ILO group 1992
Learning to use Simplified English, a preliminary study, University of Central Florida 1992
Simplified English and Machine Translation. Peter J. Pym. Professional Translator & Interpreter N?2 1991


Questions or comments, please contact us: info@linga.fr
©1999, Linga s.a.r.l., All Rights Reserved