LECTURE 37: LEXICAL TREES

COMPLEXITY OF N-GRAM SEARCH

Although it is always desirable to use as many knowledge sources as possible, there are practical problems integrating such information into a time-synchronous search.
One alternate strategy is to use a mult-pass search. However, the more accurate the first-pass, the better the performance on subsequent passes.
One of the most critical parts of search is the tree lexicon.
In a linear lexicon, each word is represented as a linear sequence of phonemes independent of other words. For example, though task and tasks share the same root, we do not share any of their history during the search process.
Large lexicons introduce enormous complexity for a backoff N-gram language model because we must "start" all words in the lexicon for every unique history in our current search space.
Can we share some of the underlying phonetic structure of words in the lexicon?