Institute for Signal and Information Processing
Mississippi State University
Phone: (601) 325-8335 Fax: (601) 325-3149
Perhaps the most challenging problem in state-of-the-art large
vocabulary continuous speech recognition (LVCSR) is to evaluate the
most likely hypotheses (sequences of words) for an unknown utterance
given the speech signal, acoustic models and the language model for
the task. The total number of possible hypotheses is prohibitively
large to perform an exhaustive search, and various sub-optimal
techniques are necessary to allow for a reasonably efficient and
accurate generation of the most probable hypotheses. This problem is
referred to as search or decoding.
In this seminar we will present an overview of the decoding strategies
prevalently used in LVCSR. Time-synchronous or breadth-first
techniques such as the Viterbi algorithm, and state-synchronous
(i.e. depth-first or best-first) methods such as the A* stack decoder
will be reviewed, along with extensions to forward-backward multipass
search algorithms, N-best searching and other hybrid decoders.
Relative merits of each algorithm will also be discussed.