/ Acoustic / Fundamentals / Production / Tutorials / Software / Home

5.4.3 Word Internal CD Models: State Tying

While context-independent models allow for more effective modeling of speech, they also present computational problems due to the number of triphones necessary to model. For example, using a set of 46 phones yields 97,336 triphones. This requires 230 million parameters to estimate from 21 million vectors. Many triphones, however, will go unseen in the training data. In fact, more than 70% are never seen or seen only once. Many of these unseen triphones were eliminated in the triphone generation step based on the transcriptions of the training data. Not every triphone context, however, is distinct. Many phones affect neighboring phones similarly. To overcome mismatches, decision trees can be used to tie states that are acoustically indistinguishable. This will decrease the size of the acoustic model file making the loading time faster. This technique is called state-tying. In the acoustic model file, all of the states are defined and a statistical model is assigned to each. There isn't always a one-to-one correspondence between a state and a statistical model. That is, the same statistical model may be assigned to more than one state. In this case, the two states are "tied". The procedure described below will tie states that are indistinguishable.

From the directory

$ISIP_TUTORIAL/sections/s05/s05_04_p03/

Run the command:

isip_recognize -param params_state_tying.sof -verbose all

Expected Output:

Command: isip_recognize -param params_state_tying.sof -verbose all

Version: 1.23 (not released) 2003/05/21 23:10:45
  
  *** no audio database file was specified ***
  
  *** no symbol graph database file was specified ***
  
  *** no transcription database file was specified ***
  
  loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
  
  loading language model: $ISIP_TUTORIAL/models/winternal_phone_models/lm_winternal_jsgf_train.sof
  
  loading statistical model pool: $ISIP_TUTORIAL/models/winternal_phone_models/smp_winternal_train.sof

The parameter file for this step, params_state_tying.sof, contains a few new significant parameters.

The algorithm in this parameter file is set to TRAIN_PARAMETER_TYING. This tells isip_recognize that its purpose will be state-tying. The implementation, ML, stands for Maximum Likelihood. The recognizer will use phonetic decision trees to make "most likely" based decisions about tying states. The next two parameters, phonetic_ques_ans_file and phonetic_decisiontee_file, point to the decision tree files that will be used to decide which states to tie.

Next, the tied states must be generalized to unseen contexts. During this step, the recognizer will produce a new set of models containing all possible triphones. Since the training data may not contain all possible triphone combinations, there may be triphones in the test data that do not appear in the language model. This step will make sure that all necessary triphones are defined in the language model. From the same directory, run the command:

isip_recognize -param params_test.sof -verbose all

Expected Output:

Command: isip_recognize -param params_test.sof -verbose all

Version: 1.23 (not released) 2003/05/21 23:10:45
  
  *** no audio database file was specified ***
  
  *** no symbol graph database file was specified ***
  
  *** no transcription database file was specified ***
  
  loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
  
  loading language model: $ISIP_TUTORIAL/models/winternal_phone_models/lm_winternal_jsgf_tied.sof
  
  loading statistical model pool: $ISIP_TUTORIAL/models/winternal_phone_models/smp_winternal_tied.sof

The only change in the parameter file for this step is the algorithm parameter which is changed to TEST_PARAMETER_TYING. Once the states have been tied and generalized, the models must be trained again. This step is covered in the next section.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home