/ Acoustic / Fundamentals / Production / Tutorials / Software / Home
5.1.4 Acoustic Modeling: Acoustic Model Types
black fading line
The predictive power of HMM's enables them to form the basis of a machine or model that can learn the characteristics of a class of random data. This learning takes place by exposing the model to sufficient examples of the data where the values are known. Such a model, if trained properly, can predict the data values when they are not known, as must be done for speech recognition. To initialize and train such a model for speech recognition, we record numerous samples of people speaking various words and phrases. We then label the words and phrases of these samples, thus creating "training" data from which the model can learn. We show the model each of these training samples in controlled sequences, allowing it to "learn" by reestimating its output probabilities according to what is known from the labeled data, thus yielding P(A|W).

The acoustic unit modeled in training can be either a word or a phoneme. Both of these acoustic model types are explained in detail in Section 4.2

Word models include each of the phonemes produced for an entire word. The model for the word "the" is shown to the right. Word models are generally used for recognition experiments consisting of few possible words. For example, word models can effectively be used for TI Digits recognition experiments. They are not very effective for experiments consisting of large vocablulary speech. For large vocabulary, phone models are more practical. Phone models contain the smallest acoustic components of a language. For example, the English language consists of about 46 phones. The image to the right shows the phones that make up one pronounciation of the word "the". Each of these acoustic model types have a specific training process associated with them.

The next several sections of this tutorial explain how to initialize and train acoustic models for speech recognition using our software. First, the process of training word models will be explained, followed by the process of training context independent phone models. Next, context dependent model training will be discussed. This tutorial includes examples for training both word internal context dependent models and cross-word context dependent models. The last section of this tutorial will explain parallel training.
   
Table of Contents   Section Contents   Previous Page Up Next Page
      Glossary / Help / Support / Site Map / Contact Us / ISIP Home