4.1.1 Overview:
Bayes' Rule
In this section, we describe how to use the recognition utility,
isip_recognize
to implement the search portion of the speech recognition problem,
and to produce the overall most probable transcription of the input
utterance. In order to decode the spoken words, certain search
algorithms are required to narrow the possibilities. Because the
complexity of an optimal or exhaustive search solution is
prohibitive for speech recognition, suboptimal search techniques
are vital to the decoding process.
The primary search algorithm used in our software is a
time-synchronous,
Viterbi beam search.
This is essentially a
breadth-first search
algorithm.
Section 4.1.2
describes search algorithms in more detail.
Note that the terms "decoding" and "recognition" are often used
interchangeably.
The search algorithm essentially integrates constraints imposed
by the language model and probabilities computed in the acoustic
models using a probabilistic framework based on
Bayes' Rule:
where
|
: acoustic model (hidden Markov models, mixture of Gaussians)
|
|
: language model (finite state machines, N-grams)
|
|
: acoustics (ignore during maximizations)
|
We can ignore the term, P(A), which represents the likelihood
of the acoustic channel. This reduces the task of finding the
most probable word sequence to a maximization
(or optimization) of:
The goal of the language model, represented in the term P(W),
is to constrain the number of allowable word sequences. The
role of the language model is described in detail in
Section 6.
The acoustic model provides a way of computing a probability for
each feature vector. This is described in detail in
Section 5.
The recognition component, which is often referred to as a
decoder,
implements a variety of search algorithms in an attempt to find
this optimal word sequence in the most efficient manner.
For a detailed discussion of Bayes' Rule,
see this lecture on the
noisy communication channel model
from our on-line
speech recognition course notes.
Next, let's review the
search process
in a little more detail.
|