Generalization of the HMM
Consider the following state diagram showing a simple language model involving constrained digit sequences:
Note the similarities to our acoustic models.

What is the probability of the sequence "zero one two three four five zero six six seven seven eight eight" ?

How would you find the average length of a digit sequence generated from this language model?
In the terminology associated with formal language theory, this HMM is known as a finite state automaton.

The word stochastic can also be applied because the transitions and output symbols are governed by probability distributions.

Further, since there are multiple transitions and observations generated at any point in time (hence, ambiguous output), this particular graph is classified as a nondeterministic automaton.

In the future, we will refer to this system as a stochastic finite state automaton (FSA or SFSA) when it is used to more linguistic information.

We can also express this system as a regular grammar:
Note that rule probabilities are not quite the same as transition probabilities, since they need to combine transition probabilities and output probabilities. For example, consider p7:

In general,

Note that we must adjust probabilities at the terminal systems when the grammar is nondeterministic:

to allow generation of a final terminal.

Hence, our transition from HMMs to stochastic formal languages is clear and well-understood.

What types of language models are used?

· No Grammar (Digits)
· Sentence pattern grammars (Resource Management)
· Word Pair/Bigram (RM, Wall Street Journal)
· Word Class (WSJ, etc.)
· Trigram (WSJ, etc.)
· Back-Off Models (Merging Bigrams and Trigrams)
· Long Range N-Grams and Co-Occurrences (SWITCHBOARD)
· Triggers and Cache Models (WSJ)
· Link Grammars (SWITCHBOARD)

How do we deal with OOV and dysfluencies?