/ Acoustic / Fundamentals / Production / Tutorials / Software / Home

5.2.3 Word Models: Single-Path Silence Training

The first step in the reestimation process is single-path silence training. The transcriptions of the training data do not contain explicit denotations of silence. Instead, the recognizer automatically inserts silence into the transcriptions during the training process. For this step, silence is inserted at the beginning and end of each utterance transcription. The acoustic unit that models this silence contains three states as shown in the figure to the right.

To begin silence training, go to the directory:

$ISIP_TUTORIAL/sections/s05/s05_02_p03/

The only file in this directory is the parameter file, params_sil.sof. From the list of parameters, notice that four passes of the Baum Welch algorithm will be applied to the training data. Four passes should be sufficient for the models to reach convergence for this step. Also notice that the transcription database does not contain transcriptions for short pauses (sp) between words. Short Pause training will be discussed next. The other parameters should look familiar. Now, run the command:

isip_recognize -param params_sil.sof -list $ISIP_TUTORIA./databases/lists/identifiers_train.sof -verbose brief

Expected Output:

Command: isip_recognize -parameter_file params_sil.sof -list $ISIP_TUTORIA./databases/lists/identifiers_train.sof -verbose brief
Version: 1.23 (not released) 2003/05/21 23:10:45
  
  loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db.sof
  
  *** no symbol graph database file was specified ***
  
  loading transcription database: $ISIP_TUTORIA./databases/db/tidigits_trans_word_db.sof
  
  loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
  
  loading language model: $ISIP_TUTORIAL/models/lm_word_digraph_init.sof
  
  loading statistical model pool: $ISIP_TUTORIAL/models/smp_word_init.sof
  
  *** no configuration file was specified ***
  
  starting iteration: 0
  
  processing file 1 (ae_12a): $ISIP_TUTORIA./databases/sof_8k/train/ae_12a.sof
  
  retrieving annotation graph for identifier: ae_12a, level: word
  
  transcription: ONE TWO 
  
  average utterance probability: -82.316417631140695, number of frames: 110
  
  processing file 2 (ae_1a): $ISIP_TUTORIA./databases/sof_8k/train/ae_1a.sof
  
  retrieving annotation graph for identifier: ae_1a, level: word
  
  transcription: ONE 
  
  average utterance probability: -78.783352320168049, number of frames: 87
  
  ...

Now, we are ready to move on to multi-path silence training.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home