For each complete state
and each state in set j,
, that has Y to the right of the dot,
add the state
(move the dot over the current nonterminal). A state produced by completion
is called a completed state.
Each completion corresponds to the
end of a nonterminal expansion started by a matching prediction step.
For each input symbol and corresponding state set, an Earley parser performs all three operations exhaustively, i.e., until no new states are generated. One crucial insight into the working of the algorithm is that, although both prediction and completion feed themselves, there are only a finite number of states that can possibly be produced. Therefore recursive prediction and completion at each position have to terminate eventually, and the parser can proceed to the next input via scanning.
To complete the description we need only specify the initial and final states. The parser starts out with
where S is the sentence nonterminal (note the empty left-hand side). After processing the last symbol, the parser verifies that
has been produced (among possibly others), where l is the length of the input x. If at any intermediate stage a state set remains empty (because no states from the previous stage permit scanning) the parse can be aborted because an impossible prefix has been detected.
States with empty LHS such as those above are useful in other contexts, as will be shown in Section 5.4. We will collectively refer to them as dummy states. Dummy states enter the chart only as a result of initialization, as opposed to being derived from grammar productions.
It is easy to see that Earley parser operations are correct, in the sense that each chain of transitions (predictions, scanning steps, completions) corresponds to a possible (partial) derivation. Intuitively, it is also true that a parser that performs these transitions exhaustively is complete, i.e., it finds all possible derivations. Formal proofs of these properties are given in the literature, e.g., Aho:72. The relationship between Earley transitions and derivations will be stated more formally in the next section.
The parse trees for sentences can be reconstructed from the chart contents. We will illustrate this in Section 5 when discussing Viterbi parses.
Table 1 shows a simple grammar and a trace of Earley parser operation on a sample sentence.
Earley's parser can deal with any type of context-free rule format, even with
null or
-productions, i.e., those that replace
a nonterminal with the empty string. Such productions do however require special
attention, and make the algorithm and its description more complicated than
otherwise necessary. In the following sections we assume that no null
productions have to be dealt with, and then summarize the necessary changes in
Section 4.7.
One might chose to simply preprocess the grammar to eliminate null productions,
a process which is also described.
Table: (a) Example grammar for a tiny fragment of
English. (b) Earley parser processing the sentence a circle touches a
triangle.
Andreas Stolcke