Earley's Algorthim
- Left-right, "predictive", top down;
- General (any CF grammar);
- O(n3) worst case time.
- The input X1, ..., Xn is scanned left-right,
looking ahead a fixed number k of symbols.
- As each Xi is read, a set of states Si
is constructed, representing the condition of the recognition process at this
stage. Each state in the set represents:
- a rule of the grammar;
- a point in the rule indicating how much of its rhs has been recognized
(a dot);
- an indication of the point at which recognition of the lhs began (the
statenumber or "origin state");
- (lookahead: a k-symbol string which is syntactically possible
following the rule).
| State |
Dotted Rule |
Origin |
Lookahead |
| 0 |
S ->
. NP VP |
0 |
|
| 2 |
VP ->
V . NP |
1 |
|
States are constructed via three operations:
Each is applied repeatedly.
- Predictor
- In a state where there is a non-terminal to the right of the dot: add a
new state to Si for each alternative for that non-terminal.
For example:
Predict adds:
- VP -> . V
- VP ->
. V NP
- VP ->
. V NP PP
- etc.
(Repeat until no further states can be added; cf.
Complete in LR(k) parsing).
- Scanner
- In a state Si where there is a terminal to the right of
the dot: compare this to the next input symbol; if they match, add to
Si+1 this configuration with the dot moved over the
terminal. For example, if S2 contains () , and input
X3 is saw, then add () to S3:
- V -> . likes
- V -> likes .
- Completer
- In a state Si where the dot is at the end of the rule
whose lhs is A: the lookahead string (if any) is compared with the next
k input symbols, if they match, the algorithm copies from the "origin
state", any dotted rules with A after the dot, and adds them to
Si, with the dot moved over A. For example, suppose
the () appears in S2, and its origin state is
S1:
we copy
from S2 all the rules that have V after the dot:
- VP -> . V
- VP ->
. V NP
- VP ->
. V NP PP
and add versions of these to S2, with the dot
moved:
- VP -> V .
- VP -> V . NP
- VP ->
V . NP PP
doug@essex.ac.uk