Overview Downloads  Tutorials
HTK Tutorials
Tutorials

TI Digits Short: Monophones - Adding sp

At this point in training we want to add a short pause (sp) between words. Before this point we modeled a phone, sil, to mark the boundaries of whole sentences or utterances. Now sp will be used to model the boundaries between individual words. Since sp and sil are essentially the same (except for the length of the pause), we will copy the model of sil, rename it as sp, and tie the acoustic models together. It is also worth noting that sp is the only monophone modeled by 3 states instead of the usual 5 (see hmmdefs).

Procedure

  1. We need to begin by copying the contents of the folder, hmm4, to the folder, hmm5. We're going to edit the hmmdefs in hmm5.

    • Open the file hmmdefs found in hmm5. Copy the definition of "sil". This can be found at the bottom of the file and starts from the line '~h "sil"' and continues until the end of the file. **The definition of each phone always begins with '~h "phone_name"' and consists of everything until the next '~h "next_phone_name"'.** Paste this at the bottom of the file, and rename "sil" as "sp".

    • Delete the first and last states of "sp" in the definition, i.e. states 2 and 4. **The definitions of each phone doesn't include the states 1 or 5 since they are considered "dummy" states**

    • Change the remaining state for "sp" from state 3 to state 2 (since there are 3 states in sp state 2 is the middle state and states 1 ad 3 are "dummy states")

    • Change NUMSTATES to 3 instead of 5

    • Change "TRANSP 5" to "TRANSP 3", and use the following transition matrix:

    • 0.0 1.0 0.0
      0.0 0.9 0.1
      0.0 0.0 0.0

  2. Now we want to tie the middle states of sp(state 2) and sil(state 3) and also add transitions between state 2 and 4 of sil and state 1 and 3 of "sp". We use HTK's HHED function to tie states. The file sil.led tells the system which states to tie together from the files in hmm5 and then stores them in hmm6. Since we now incorporate "sp" as a phone we now use monophones1 instead of monophones0.

    • From the directory isip/exp/htk_tutorial/train type (all one line) type:
      HHEd -A -D -T 1 -H hmm5/macros -H hmm5/hmmdefs -M hmm6 sil.led monophones1


  3. Now that we've created the definition for "sp" and tied its middle state to "sil" we need to use the EM algorithm again to re-estimate our models. Again, since we're now modeling "sp" we use monophones1 and also train_trans_phones1.mlf.

    • From the directory isip/exp/htk_tutorial/train type (all one line from the command line):
      HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1

      HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1

      HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones1

      HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t 250.0 150.0 3000.0 -S train_list.list -H hmm9/macros -H hmm9/hmmdefs -M hmm10 monophones1



Data Preparation Training Decoding