next up previous
Next: Syntactic Structure Up: Text Structure Previous: The relations used in

The structure of a text node in ILEX

General characteristics of a text-node

A piece of text is represented in ILEX as an object of type text-node. A text-node takes two roles:

There are several types of text-node. In the present section, we will be concerned with the main type, which is termed body-node. (Other types of text-node include things like title and label, which are specified by other parts of the artifact-schema, and are constructed according to simpler principles.)

body-nodes, RST structure and syntax

A body-node is specified along two dimensions.

The two independent dimensions are needed because RST analysis can sometimes descend within a single sentence (for instance, a complex sentence can have RST structure), while at other times a complex sentence can be an atomic RST unit (as is the case, for instance, with restrictive subordinate clauses). Thus we can draw the following table of possibilities for objects of type body-node:

 
rst-node not-rst-structured
multisentential-node x
syntactic-node x x
Table 1: The possible types of body-node

 

Note that there's no such thing as a multisentential-node which is not-rst-structured.

The present section is concerned with rst-nodes, whether they comprise a single (complex) sentence or a number of separate sentences. The section on the clause grammar (Section gif) deals with syntactic-nodes, whether they are rst-structured or not.

Nuclearity for rst-nodes

An rst-node represents a complex text span, specifying (a) its constituent text spans, and (b) the way in which these spans are combined. Each constituent text span is a unit of type body-node; in other words, either another rst-node or a not-rst-structured node. The former case allows for the recursive specification of rst-nodes of arbitrary size; the latter case is where rst-nodes bottom out into simple text spans.

There are three subtypes of rst-node, corresponding to the three structural types of relation in RST. Each type specifies different roles for its constituent spans.

Example of an RST tree

Figure 7 shows a portion of an RST tree in ILEX.

  figure402
Figure 7: The structure of an RST tree in ILEX

Firstly, note three different types of body-node.

Each rst-node has slots for its subconstituents such as nuclei and satellites. For instance, node (a) has :nuc and :sat roles.

Each syntactic-node has a :syn slot, which points to a unit that specifies its syntactic structure. For instance, node (c) has a slot of this kind. Note that node (b), being both an rst-node and a syntactic-node, has both types of slot, and hence can be used either to recover RST structure or syntactic structure.

It might seem odd for syntactic-nodes (such as node (i)) to have constituents that are not themselves syntactic-nodes. But in fact this conforms to the general format for syntactic-nodes in ILEX: the intermediary nodes (node (c) in this case) provide an extra level of representation to allow every syntactic-node to be specified in terms of :Syn, :Sem and :Orth. (See Section gif for more details on this general format.)


next up previous
Next: Syntactic Structure Up: Text Structure Previous: The relations used in

ilex