Overview: An overview of the speech recognition process is shown below. There are three main components to the process: acoustic modeling, language modeling, and search. Search is often referred to as recognition, decoding or evaluation. It is the process by which the system uses a fully-trained recognizer to produce a hypothesis of what was spoken. It is the main topic of this section. Acoustic modeling is described in Section 5 and language modeling is described in Section 6. Conversion of the speech signal to a text message containing the spoken words is only one of many tasks entailed in the process of automatic speech recognition. Once the acoustic and language models are built, recognition requires searching all possibilities generated by these models. The number of possibilities generated can be prohibitive. Thus, efficient search techniques are critical to the performance of a recognizer. Most recognition systems use the Viterbi beam search algorithm, but other algorithms may be used and are supported in the software. Continue to Section 4.1 for additional theoretical information on search algorithms for speech recognition. Contents:
|