Aravind Joshi
University of Pennsylvania, Philadelphia, Pennsylvania, USA
Parsing a sentence means to compute the structural description (descriptions) of the sentence assigned by a grammar, assuming, of course, that the sentence is well-formed. Mathematical work on parsing consists of at least the following activities.
The structural descriptions provided by a grammar depend on the grammar formalism to which the grammar belongs. For the well-known context-free grammar (CFG) the structural description is, of course, the conventional phrase structure tree (tress) associated with the sentence. The parse tree describes the structure of the sentence. It is also the record of the history of derivation of the sentence. Thus, in this case the structural description and the history of the derivation are the same objects. Later, we will comment on other grammar formalisms and the structural descriptions and histories of derivation associated with them.
For CFGs we have the well-known algorithms by Cocke, Kasami, and Younger
(CKY) [Kas65,You67]
and the Earley algorithm [Ear70].
All CFG algorithms are related to these two in one way or another. As regards
the complexity of these algorithms it is well-known that the worst case
complexity is
where n is the length of
the sentence. There is a multiplicative factor which depends on the size of the
grammar and it is
where G is the size of the grammar
(expressed appropriately in terms of the number of rules and the number of
non-terminals). There are results which show improvements in the exponents of
both n and G but these are not significant for our purpose. Of
course, these complexity results mostly are worst case results
(upper bounds) and therefore, they are not directly useful. They do however
establish polynomial parsing of these grammars. There are no mathematical
average case results. All average case results reported are empirical. In
practice, most algorithms for CFGs run much better than the worst case and the
real limiting factor in practice is the size of the grammar. For
a general discussion of parsing strategies, see [Lee93,Ned94,Sik94].
During the past decade or so a number of new grammar formalisms have been introduced for a variety of reasons, for example,
eliminating transformations in a grammar, accounting linguistic structures
beyond the reach of context-free grammars, integrating syntax and semantics directly, etc.. Among these new
formalisms, there is one class of grammars called mildly context-sensitive
grammars that has been mathematically investigated very
actively. In particular, it has been shown that a number of grammar formalisms
belonging to this class are weakly equivalent, i.e., they generate the same set
of string languages. Specifically, tree-adjoining grammars (TAG), combinatory
categorial grammars (CCG), linear indexed grammars (LIG), and head grammars (HG)
are weakly equivalent. From the perspective of parsing, weak equivalence by itself is not very interesting because weak equivalence alone
cannot guarantee that a parsing technique developed for one class of grammars
can be extended to other classes, or a uniform parsing procedure can be
developed for all these equivalent grammars. Fortunately, it has been shown that
indeed it is possible to extend a recognition algorithm for CFGs (the CKY
algorithm) for parsing linear indexed grammars (LIG). Then this parser can be
adapted for parsing TAGs, HGs, as well as CCGs. This new algorithm is
polynomial, the complexity being
. The key mathematical notion
behind the development of this general algorithms is that in all these grammars
what can happen in a derivation depends only on which of the finite set of
states the derivation is in. For CFG these states can be nonterminal
symbols. This property called the context-freeness
property is crucial because it allows one to keep only a limited amount of
context during the parsing process, thus resulting in a polynomial time
algorithm. For CFGs this property holds trivially. The significant result here
is that this property also extends to the grammars more powerful than CFGs,
mentioned above. An Earley type algorithm has also been developed for the
tree-adjoining grammars and its complexity has been shown to be
also
. For
further information on these results, see [JVSW91,SJ88,VSW93].
Although the grammars mentioned above are weakly equivalent and uniform parsing strategies have been developed for them, it should be noted that the notions of what constitutes a parse are quite different for each one of these grammars. Thus in a TAG the real parse of a sentence is the so-called derivation tree, which is a record of how the elementary trees of a TAG are put together by the operations of substitution and adjoining in order to obtain the derived tree whose yield is the string being parsed. The nodes of the derivation tree are labeled by the names of the elementary trees and the edges are labeled by the addresses of the tree labeling the parent node in which the trees labeling the daughter nodes are either substituted or adjoined. This derivation tree is unlike the derivation tree for a CFG for which the notions of the derivation tree and the derived tree are the same. For TAG these are distinct notions. For HG which deals with headed strings and operations of concatenation and wrapping (both are string operations) there is no notion of a derived tree as such. There is only the notion of a derivation tree which is a record of how the elementary strings are put together and what operations were used in this process. The terminal nodes are labeled by elementary strings (headed strings) and the other nodes are labeled by the operations used for combining the strings labeling the daughter nodes and also by the string resulting by performing this operation. Thus this derivation tree is quite unlike the standard phrase structure tree, especially when the combining operation labeling a node is wrapping (wrapping one string around another to the right or left of its head) as a non-standard constituent structure can be defined for the resultant tree.
For a CCG the parse of a sentence is the proof tree of the derivation. It is like the phrase structure tree in the sense that the nodes are labeled by categories in CCG, however, for each node the name of the operation used in making the reduction (for example, function application or function composition) has to be stated at the node also. Thus, in this sense they are like the derivation trees of HG. The derivation trees of LIG are like the phrase structure trees except that with each node the contents of the stack associated with that node are stated. Given this wide divergence of what constitutes a structural description, the significance of the equivalence result and the existence of a general polynomial parsing strategy can be better appreciated.
Almost all computational grammars incorporate feature structures (attribute value structures), the category label being a special attribute singled out for linguistic convenience and not for any formal reasons. These feature structures are manipulated by the operation of unification, hence the term unification-based grammars. CFGs or any of the grammars mentioned above can serve as the backbones for the unification-based grammars, CFGs being the most common [Shi88]. As soon as feature structures and unification are added to a CFG the resulting grammars are Turing equivalent and they are no longer polynomially parsable. In practice, conditions are often placed on the possible feature structures which allow the polynomial probability to be restored. The main reason for the excessive power of the unification-based grammars is that recursion can be encoded in the feature structures. For certain grammars in the class of mildly context-sensitive grammars, in particular for TAGs, recursion is factored away from the statement of so-called long-distance dependencies. The feature structures can be without any recursion in them, thus preserving the polynomial parsability of the TAG backbone.
Some unification-based grammar formalisms, for example, the lexical-functional grammar (LFG) are very explicit in assigning both a phrase structure and a feature structure based functional structure to the sentence being parsed. Formal properties of the interface between these two components have been studied recently. In particular, computational complexity of this interface has been studied independently of the complexity of the phrase structure component and the feature structure component. A number of properties of different interface strategies have been studied that can be exploited for computational advantage. A surprising result here is that under certain circumstances an interface strategy that does no pruning in the interface performs significantly better than one that does. For an interesting discussion of these results, see [MK93].
Parsing can be viewed as a deductive process as is the case in CCG mentioned above. The Lambek Calculus (LC) is a very early formulation of parsing as deduction [Lam58]. The relationship between LC and CFG was an open question for over thirty years. Very recently it has been shown that LC and CFG are weakly equivalent [Pen93]. However, the proof of this equivalence does not seem to suggest a construction of a polynomial parsing algorithm for LC and this is an important open question. The framework of parsing as deduction allows modular separation of the logical aspects of the grammar and the proof search procedure, thus providing a framework for investigating a wide range of parsing algorithms. Such theoretical investigations have led to the development of a program for rapid prototyping and experimentation with new parsing algorithms and has been also used in the development of algorithms for CCGs, TAGs, and lexicalized CFGs. For further details, see [SSP94].
Left-to-right, rightmost derivation (LR) parsing was introduced initially for efficient parsing of languages recognized by deterministic pushdown automata and have proven useful for compilers. For natural language parsing LR parsers are not powerful enough, however conflicts between multiple choices are solved by pseudo-parallelism [Lan74,Tom87]. Johnson and Kipps independently noted that the Tomita method is not bounded by any polynomial in the length of the input string and the size of the grammar. Kipps also shows how to repair this problem. These results are presented in [Tom91]. LR parsing has been applied to non-context-free languages also in the context of natural language parsing [SV90].
Finite state devices have always played a key role in natural language processing. There is renewed interest in these devices because of their successful use in morphological analysis by representing very large dictionaries by finite state automata (FSA) and by representing two-level rules and lexical information with finite state transducers (FST). FSAs have been used for parsing also, as well as for approximating CFGs. A main drawback of using FSA for parsing is the difficulty of representing hierarchical structure, thus giving incomplete parses in some sense. Recently, there has been theoretical work on FSTs for their use in parsing, one of the approaches being the use of an FST and computing the parse as a fixed point of the transduction. The parsers can be very efficient and are well suited for large highly lexicalized grammars. For further details on these issues, see [KKZ92,PRW91,Roc94].
We have not discussed various related mathematical topics, for example, mathematical properties of grammars, unless they are directly relevant to parsing, and the mathematical results in the application of statistical techniques to parsing. This latter topic is the subject of another contribution in this chapter. It is worth pointing out that recent mathematical investigations on lexicalized grammars have great significance to the use of statistical techniques for parsing as these grammars allow a very direct representation of the appropriate lexical dependencies.
Some of the key research problems are (1) techniques for improving the efficiency of the parsing systems by exploiting lexical dependencies, (2) techniques for exploiting certain regularities in specific domains, e.g., particular sentence patterns tend to appear more often in specific domains, (3) systematic techniques for computing partial parses, less than complete parses in general, (4) applying finite state technology for parsing, in particular for partial parses, (5) systematic techniques for integrating parsing with semantic interpretation and translation, (6) investigating parallel processing techniques for parsing and experimenting with large grammars.
Small workshops held in a periodic fashion where both theoretical and experimental results can be informally discussed will be the best way to exchange information in this area. Specific initiatives for parsing technology, for example for parallel processing technology for parsing are needed for rapid progress in this area.