A stochastic context-free grammar (SCFG) extends the standard context-free formalism by adding probabilities to each production:
where the rule probability p is usually written as
. This notation
to some extent hides the fact that p is a conditional probability, of
production
being chosen,
given that X is up for expansion. The probabilities of all rules with the
same nonterminal X on the LHS must therefore sum to unity.
Context-freeness in a probabilistic setting translates into conditional
independence of rule choices. As a result, complete derivations have joint
probabilities that are simply the products of the rule probabilities involved.
The probabilities of interest mentioned in Section 1 can now be defined formally.
In the following, we assume that the probabilities in a SCFG are
proper and consistent as defined in Booth:73, and that the grammar
contains no useless nonterminals (ones that can never appear in a
derivation). These restrictions ensure that all nonterminals define probability
measures over strings, i.e.,
is a
proper distribution over x for all X. Formal definitions of these
conditions are given in Appendix A.
Andreas Stolcke