TI Digits Short: Monophones - Adding sp
At this point in training we want to add a short pause (sp) between words. Before this point we modeled a
phone, sil, to mark the boundaries of whole sentences or utterances. Now sp will be used to model
the boundaries between individual words. Since sp and sil are essentially the same (except for the length
of the pause), we will copy the model of sil, rename it as sp, and tie the acoustic models together. It is also
worth noting that sp is the only monophone modeled by 3 states instead of the usual 5 (see hmmdefs).
Procedure
-
We need to begin by copying the contents of the folder, hmm4, to the folder, hmm5. We're going to edit the
hmmdefs in hmm5.
- Open the file hmmdefs found in hmm5. Copy the definition of "sil". This can be found at the bottom
of the file and starts from the line '~h "sil"' and continues until the end of the file. **The
definition of each phone always begins with '~h "phone_name"' and consists of everything until the
next '~h "next_phone_name"'.** Paste this at the bottom of the file, and rename "sil" as "sp".
- Delete the first and last states of "sp" in the definition, i.e. states 2 and 4. **The
definitions of each phone doesn't include the states 1 or 5 since they are considered "dummy"
states**
- Change the remaining state for "sp" from state 3 to state 2 (since there are 3 states
in sp state 2 is the middle state and states 1 ad 3 are "dummy states")
- Change NUMSTATES to 3 instead of 5
- Change "TRANSP 5" to "TRANSP 3", and use the following transition matrix:
0.0 1.0 0.0
0.0 0.9 0.1
0.0 0.0 0.0
-
Now we want to tie the middle states of sp(state 2) and sil(state 3) and also add transitions between
state 2 and 4 of sil and state 1 and 3 of "sp". We use HTK's HHED function to tie states. The file
sil.led tells the system which states to tie together from the files in hmm5 and then stores them in hmm6.
Since we now incorporate "sp" as a phone we now use monophones1 instead of monophones0.
- From the directory isip/exp/htk_tutorial/train type (all one line) type:
HHEd -A -D -T 1 -H hmm5/macros -H hmm5/hmmdefs -M hmm6 sil.led monophones1
-
Now that we've created the definition for "sp" and tied its middle state to "sil" we need to use the EM
algorithm again to re-estimate our models. Again, since we're now modeling "sp" we use monophones1 and also
train_trans_phones1.mlf.
- From the directory isip/exp/htk_tutorial/train type (all one line from the command line):
HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t
250.0 150.0 3000.0 -S train_list.list -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1
HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t
250.0 150.0 3000.0 -S train_list.list -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1
HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t
250.0 150.0 3000.0 -S train_list.list -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones1
HERest -B -A -D -T 1 -C config0 -I ../data_preparation/trans/train_trans_phones1.mlf -t
250.0 150.0 3000.0 -S train_list.list -H hmm9/macros -H hmm9/hmmdefs -M hmm10 monophones1
Data Preparation
|
Training
|
Decoding
|
|