/ Recognition / Fundamentals / Production / Tutorials / Software / Home

4.2.5 Network Decoding: Cross-Word Context-Dependent Phones:

Another type of phone is the cross-word context-dependent phone. The word-internal context-dependent phones in the last section consider the surrounding context of a phone within a word, but these phones do not consider the surrounding context across the boundaries of the word. In the last section, the beginning and ending phone of a word was a di-phone since no context preceded or followed. In this section, we will look at cross-word context-dependent phones which examine the context across the boundaries of a word within a sentence.

The experiment below decodes a single utterance using cross-word context-dependent phones. It might take several minutes to load the acoustic models, so please be patient. Go to the directory $ISIP_TUTORIAL/sections/s04/s04_02_p05/

cd $ISIP_TUTORIAL/sections/s04/s04_02_p05/

and run the following command:

isip_recognize -parameter_file params_decode.sof -list $ISIP_TUTORIAL/sections/s04/s04_02_p05 -verbose ALL

This will produce the following output:

Command: isip_recognize -parameter_file params_decode.sof -list /ftp/pu./projects/speech/software/tutorials/production/
fundamentals/current/example./databases/lists/identifiers_test.sof -verbose ALL
Version: 1.23 (not released) 2003/05/21 23:10:45
  
  loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db_test.sof
  
  *** no symbol graph database file was specified ***
  
  *** no transcription database file was specified ***
  
  loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
  
  loading language model: $ISIP_TUTORIAL/models/xword_phone_models/compare/lm_xword_jsgf_8mix.sof
  
  loading statistical model pool: $ISIP_TUTORIAL/models/xword_phone_models/compare/smp_xword_8mix.sof
  
  *** no configuration file was specified ***
  
  opening the output file: $ISIP_TUTORIAL/sections/s04/s04_02_p05/results.out
  
  processing file 1 (ah_111a): $ISIP_TUTORIA./databases/sof_8k/test/ah_111a.sof
    
    hyp:    ONE ONE ONE 
    score:  -9122.6484375   frames: 138
  
  processing file 2 (ah_1a): $ISIP_TUTORIA./databases/sof_8k/test/ah_1a.sof
    
    hyp:    ONE 
    score:  -5187.28173828125   frames: 79

    .....

As you probably noticed, this experiment took a lot longer than some of the previous experiments. Most of that extra time was spent loading the acoustic models. The acoustic model file for this experiment is quite large since the number of phones has increased. In practical experiments, a large list of utterances will be decoded. Each utterance will use the same acoustic models, and the models won't have to load before each utterance.

Cross-word context-dependent phones are different from word-internal phones because they consider surrounding words as well as surrounding phones. The image to the right illustrates a cross-word context-dependent language model. This language model has only one word, zero, with two pronounciations. As you can see in the phone level, all of the phones are triphones. Since there is just one word in this model, the only sequence possibilities are a zero followed by another zero, or a zero followed by silence. The first and last triphone of the word consider this possiblility. For a language model consisting of many words, the phone level becomes extremely complex.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home