5.4.3 Word Internal CD Models: State Tying While context-independent models allow for more effective modeling of speech, they also present computational problems due to the number of triphones necessary to model. For example, using a set of 46 phones yields 97,336 triphones. This requires 230 million parameters to estimate from 21 million vectors. Many triphones, however, will go unseen in the training data. In fact, more than 70% are never seen or seen only once. Many of these unseen triphones were eliminated in the triphone generation step based on the transcriptions of the training data. Not every triphone context, however, is distinct. Many phones affect neighboring phones similarly. To overcome mismatches, decision trees can be used to tie states that are acoustically indistinguishable. This will decrease the size of the acoustic model file making the loading time faster. This technique is called state-tying. In the acoustic model file, all of the states are defined and a statistical model is assigned to each. There isn't always a one-to-one correspondence between a state and a statistical model. That is, the same statistical model may be assigned to more than one state. In this case, the two states are "tied". The procedure described below will tie states that are indistinguishable. From the directory
Command: isip_recognize -param params_state_tying.sof -verbose all Version: 1.23 (not released) 2003/05/21 23:10:45 *** no audio database file was specified *** *** no symbol graph database file was specified *** *** no transcription database file was specified *** loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof loading language model: $ISIP_TUTORIAL/models/winternal_phone_models/lm_winternal_jsgf_train.sof loading statistical model pool: $ISIP_TUTORIAL/models/winternal_phone_models/smp_winternal_train.sofThe parameter file for this step, params_state_tying.sof, contains a few new significant parameters.
implementation = "ML"; phonetic_ques_ans_file = "$ISIP_TUTORIAL/examples/sections/s05/s05_05_p03/ques_ans_tidigits.sof"; phonetic_decisiontree_file = "$ISIP_TUTORIAL/examples/sections/s05/s05_05_p03/decision_tree.sof"; Next, the tied states must be generalized to unseen contexts. During this step, the recognizer will produce a new set of models containing all possible triphones. Since the training data may not contain all possible triphone combinations, there may be triphones in the test data that do not appear in the language model. This step will make sure that all necessary triphones are defined in the language model. From the same directory, run the command:
Command: isip_recognize -param params_test.sof -verbose all Version: 1.23 (not released) 2003/05/21 23:10:45 *** no audio database file was specified *** *** no symbol graph database file was specified *** *** no transcription database file was specified *** loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof loading language model: $ISIP_TUTORIAL/models/winternal_phone_models/lm_winternal_jsgf_tied.sof loading statistical model pool: $ISIP_TUTORIAL/models/winternal_phone_models/smp_winternal_tied.sofThe only change in the parameter file for this step is the algorithm parameter which is changed to TEST_PARAMETER_TYING. Once the states have been tied and generalized, the models must be trained again. This step is covered in the next section. |