TI-Digits Short: Monophones-Flat Start

TI Digits Short: Monophones - Flat Start

Now that we have the correct files in the proper formats, we will begin training our acoustic models. Using the transcriptions and MFCC features, we can train Hidden Markov Models so that we can generate transcriptions for our testing data. The EM algorithm is used to reestimate the means, variances, and transition probabilities of a multivariate Gaussian distribution (training Gaussian mixture models will be discussed later). We begin with a flat start which means that to initialize our models, we will basically take the average across all of our acoustic features. The other alternative to a flat start is called boot-strapping which implies that your initial acoustic models are taken from HMMs that were previously trained on other data. We opt not to do this since we assume we are beginning from scratch.

Procedure

We need to generate a list of files to train on and test on. Use the perl script, list_generator.pl, to create train_list.list and test_list.list:
- From the directory isip/exp/htk_tutorial/train generate the train list by typing: list_generator.pl /usr/local/isip/exp/htk_tutorial/data_preparation/data/train train_list.list
- From the directory isip/exp/htk_tutorial/train generate the test list by typing: list_generator.pl /usr/local/isip/exp/htk_tutorial/data_preparation/data/test test_list.list
In our training directory we need to create folders that we can save our HMM parameters. For this experiment we will need 15 folders so in the train directory create the folders hmm0, hmm1, ..., hmm14.
- From the directory isip/exp/htk_tutorial/train type: mkdir hmm0
- Repeat the above step except for the remaining folders, i.e hmm1, hmm2, ... , hmm14:
Now we need to initialize the structure of our HMM files. Since we use 39 acoustic features we have a Gaussian distribution with 39 variables for the mean and 39 variables for the covariance matrix (since the acoustic features are independent of each other we keep only the diagonal of the covariance matrix). We use "proto" to define this format for our models.

We then average across the data using the function "HCompV" to initialize the model. config0 dictates the parameters of the features so when we use HcompV it does the correct calculation while train_list.list indicates which data to use for this step. We then store the result as our new "proto" in the hmm0 folder. Finally, we take this initialized model and copy it for each monophone model. Thus every model is initialized identically before we use the EM algorithm to reestimate.
- Create the model: Copy proto from htk_tutorial/data_preparation/define_proto/ to htk_tutorial/train/hmm0
- Initialize the model - From the directory isip/exp/htk_tutorial/train type: HCompV -C config0 -f 0.01 -m -S train_list.list -M hmm0 hmm0/proto
- Make hmmdef (i.e. copy the initial model to each monophone model) - From the directory isip/exp/htk_tutorial/train type: make_hmmdef.pl monophones0 hmm0/proto hmm0/hmmdefs
- Make macros - From the directory isip/exp/htk_tutorial/train type: make_macro.pl hmm0/vfloor hmm0/proto hmm0/macros
With everything initialized we can use the EM algorith to re-estimate each monophone's model using HERest. To accomplish this we pass the monophone labels (train_trans_phones0.mlf), the previous acoustic models for each monophone (i.e. the previous hmmdef) and the list of training data (train_list.list). config0, again, specifies the format of the data (further details of the significance of each line can be found in the the HTK book). The new hmmdefs are stored in the next hmm folder as binary files although they can be stored as plain text if the "-B" argument is excluded from the command line argument. At this point we run 4 re-estimations.
- From the directory isip/exp/htk_tutorial/train type (all one line from the command line):
  
  HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones0.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0
  
  HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones0.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm1/macros -H hmm1/hmmdefs -M hmm2 monophones0
  
  HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones0.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm2/macros -H hmm2/hmmdefs -M hmm3 monophones0
  
  HERest -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones0.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm3/macros -H hmm3/hmmdefs -M hmm4 monophones0

Data Preparation

Language Model Preparation

Dictionary Preparation & Phone Lists

Feature Extraction

Transcription Preparation

Training

Monophones: Flat Start

Monophones: Adding sp

Monophones: Realignment

Generating Triphone Lists and Initial Training

Word Internal Triphones: State Tying

Decoding

Creating Test Transcriptions and Final Results