Formalities
The discrete observation HMM is restricted to the production of a finite set of discrete observations (or sequences). The output distribution at any state is given by:

The observation probabilities are assumed to be independent of time. We can write the probability of observing a particular observation, , as:

The observation probability distribution can be represented as a matrix whose dimension is K rows x S states.
We can define the observation probability vector as:
,    or,    
The mathematical specification of an HMM can be summarized as:

For example, reviewing our coin-toss model:
Recognition Using Discrete HMMs
Denote any partial sequence of observations in time by:

The forward partial sequence of observations at time  is

The backward partial sequence of observations at time  is

A complete set of observations of length  is denoted as .
What is the likelihood of an HMM?
We would like to calculate  - however, we can't. We can (see the introductory notes) calculate . Consider the brute force method of computing this. Let  denote a specific state sequence. The probability of a given observation sequence being produced by this state sequence is:

The probability of the state sequence is

Therefore,

To find , we must sum over all possible paths:

This requires  flops. For  and , this gives about  computations per HMM!
The "Any Path" Method (Forward-Backward, Baum-Welch)
The forward-backward (F-B) algorithm begins by defining a "forward-going" probability sequence:

and a "backward-going" probability sequence:

Let us next consider the contribution to the overall sequence probability made by a single transition:

Summing over all possibilities for reaching state "":

Baum-Welch (Continued)
The recursion is initiated by setting:

Similarly, we can derive an expression for :

This recursion is initialized by:

We still need to find :

for any state . Therefore,

But we also note that we should be able to compute this probability using only the forward direction. By considering , we can write:

These equations suggest a recursion in which, for each value of  we iterate over ALL states and update . When ,  is computed by summing over ALL states.

The complexity of this algorithm is , or for  and , approximately 2500 flops are required (compared to  flops for the exhaustive search).
The Viterbi Algorithm
Instead of allowing any path to produce the output sequence, and hence, creating the need to sum over all paths, we can simply assume only one path produced the output. We would like to find the single most likely path that could have produced the output. Calculation of this path and probability is straightforward, using the dynamic programming algorithm previously discussed:

where

(in other words, the predecessor node with the best score). Often, probabilities are replaced with the logarithm of the probability, which converts multiplications to summations. In this case, the HMM looks remarkably similar to our familiar DP systems.

Beam Search
In the context of the best path method, it is easy to see that we can employ a beam search similar to what we used in DP systems:

In other words, for a path to survive, its score must be within a range of the best current score. This can be viewed as a time-synchronous beam search. It has the advantage that, since all hypotheses are at the same point in time, their scores can be compared directly. This is due to the fact that each hypothesis accounts for the same amount of time (same number of frames).


P(H):	P1	P2	P3
P(T):	1-P1	1-P2	1-P3