/ Acoustic / Fundamentals / Production / Tutorials / Software / Home

5.2.4 Word Models: Multi-Path Silence Training

In the previous step, silence was automatically added at the beggining and end of each utterance transcription and the models were trained. Now, we want to train the models further, but account for the silence between each word in the utterance. We call this process multi-path silence training. This word-internal silence is sometimes (but not always) shorter than the silence at the beginning and ending of an utterance, so we want to use an acoustic unit that better models these short pauses. The figure to the right illustrates the accoustic model used for word-internal silence. States 1, 2, and 3 model long durations of silence, while state 4 models short durations. Once again, the recognizer automatically inserts the silence between each word in an utterance transcriptions. This saves us from having include silence symbols in out transcription database and also saves us the trouble of using different transcriptions for the single and multi-path silence training steps.

The parameter file for this step is slightly different than the one we used for single-path silence training. Go to the directory:

$ISIP_TUTORIAL/sections/s05/s05_02_p04/

and look at the parameter file, params_sp.sof. You'll notice that most of the parameters are the same, but that a configuration parameter has been added. This parameter points to a configuration file in the same directory called config.sof. This configuration file contains yet another parameter called search_non_speech_internal_symbols that is assigned the SILENCE symbol. This parameter tells the recognizer to add the SILENCE symbol to the language model as a non-speech internal symbol, or more specifically, a symbol that the recognizer will assume exists between two words in an utterance.

From this directory, run the command:

isip_recognize -param params_sp.sof -list $ISIP_TUTORIA./databases/lists/identifiers_train.sof -verbose brief

Expected Output:

  loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db.sof

  *** no symbol graph database file was specified ***

  loading transcription database: $ISIP_TUTORIA./databases/db/tidigits_trans_wo\
rd_db.sof

  loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof

  loading language model: $ISIP_TUTORIAL/models/word_models/lm_word_jsgf_init.s\
of

  loading statistical model pool: $ISIP_TUTORIAL/models/word_models/smp_word_in\
it.sof

  *** no configuration file was specified ***

  starting iteration: 0

  processing file 1 (ae_12a): $ISIP_TUTORIA./databases/sof_8k/train/ae_12a.sof

  retrieving annotation graph for identifier: ae_12a, level: word

  transcription: ONE TWO

  average utterance probability: -82.462353512546912, number of frames: 110

  processing file 2 (ae_1a): $ISIP_TUTORIA./databases/sof_8k/train/ae_1a.sof

  retrieving annotation graph for identifier: ae_1a, level: word

  transcription: ONE

  ...

This step completes the silence training process. Next, we'll explore a training technique that can be used to improve our models called mixture splitting.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home