LECTURE 38: OPTIMIZATION OF LEXICAL TREES

SHARING TAILS OF TREES

Assume a bigram language model.
A linear tail in a lexical tree is defined as a subpath ending in a leaf node and going through states with a unique successor (also called a single-word subpath).
LM factorization pushes forward the LM probability to the last arc of the linear tail.
We can optimize a tree to take advantage of shared-tail optimization. Consider this tree before optimization:

and this tree after shared-tail optimization:
What are the advantages of this approach?