SPEECH RECOGNITION REQUIRES GOOD
PATTERN RECOGNITION AND SEARCH
- Continuous speech recognition is both a pattern recognition
and search problem. Why?
- The decoding process of a speech recognizer finds the most
probable sequence of words given the acoustic and language models.
Recall our basic equation for speech recognition:
Search is the process of finding the most probable word sequence:
- The complexity of the search algorithm depends heavily on the
nature of the search space, which in turn, depends heavily
on the language model constraints (e.g., networks vs. N-grams).
- Speech recognition typically uses a hierarchical Viterbi beam search
for decoding/recognition, and A* stack decoding
for N-best and word graph generation.