5.2.3 Word Models: Single-Path Silence Training
The first step in the reestimation process is single-path silence
training. The transcriptions of the training data do not contain
explicit denotations of silence. Instead, the recognizer automatically
inserts silence into the transcriptions during the training process.
For this step, silence is inserted at the beginning and end of each
utterance transcription. The acoustic unit that models
this silence contains three states as shown in the figure to the right.
|
|
To begin silence training, go to the directory:
$ISIP_TUTORIAL/sections/s05/s05_02_p03/
The only file in this directory is the parameter file,
params_sil.sof.
From the list of parameters, notice that four passes of the Baum Welch
algorithm will be applied to the training data. Four passes should be
sufficient for the models
to reach convergence for this step. Also notice that the transcription
database does not contain transcriptions for short pauses (sp) between
words. Short Pause training will be discussed next. The other parameters
should look familiar.
Now, run the command:
isip_recognize -param params_sil.sof -list $ISIP_TUTORIA./databases/lists/identifiers_train.sof -verbose brief
Expected Output:
Command: isip_recognize -parameter_file params_sil.sof -list $ISIP_TUTORIA./databases/lists/identifiers_train.sof -verbose brief
Version: 1.23 (not released) 2003/05/21 23:10:45
loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db.sof
*** no symbol graph database file was specified ***
loading transcription database: $ISIP_TUTORIA./databases/db/tidigits_trans_word_db.sof
loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
loading language model: $ISIP_TUTORIAL/models/lm_word_digraph_init.sof
loading statistical model pool: $ISIP_TUTORIAL/models/smp_word_init.sof
*** no configuration file was specified ***
starting iteration: 0
processing file 1 (ae_12a): $ISIP_TUTORIA./databases/sof_8k/train/ae_12a.sof
retrieving annotation graph for identifier: ae_12a, level: word
transcription: ONE TWO
average utterance probability: -82.316417631140695, number of frames: 110
processing file 2 (ae_1a): $ISIP_TUTORIA./databases/sof_8k/train/ae_1a.sof
retrieving annotation graph for identifier: ae_1a, level: word
transcription: ONE
average utterance probability: -78.783352320168049, number of frames: 87
...
Now, we are ready to move on to multi-path silence training.
|