4.2.5 Network Decoding:
Cross-Word Context-Dependent Phones:
Another type of phone is the cross-word context-dependent phone. The
word-internal context-dependent phones in the last section
consider the surrounding context of a phone within a word, but these
phones do not consider the surrounding context across the
boundaries of the word. In the last
section, the beginning and ending phone of a word was a di-phone
since no context preceded or followed. In this section, we will look at
cross-word context-dependent phones which examine the context across the
boundaries of a word within a sentence.
The experiment below decodes a single utterance using cross-word
context-dependent phones. It might take several minutes to load the
acoustic models, so please be patient. Go to the directory
$ISIP_TUTORIAL/sections/s04/s04_02_p05/
cd $ISIP_TUTORIAL/sections/s04/s04_02_p05/
and run the following command:
isip_recognize -parameter_file params_decode.sof -list $ISIP_TUTORIAL/sections/s04/s04_02_p05 -verbose ALL
This will produce the following output:
Command: isip_recognize -parameter_file params_decode.sof -list /ftp/pu./projects/speech/software/tutorials/production/ fundamentals/current/example./databases/lists/identifiers_test.sof -verbose ALL
Version: 1.23 (not released) 2003/05/21 23:10:45
loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db_test.sof
*** no symbol graph database file was specified ***
*** no transcription database file was specified ***
loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof
loading language model: $ISIP_TUTORIAL/models/xword_phone_models/compare/lm_xword_jsgf_8mix.sof
loading statistical model pool: $ISIP_TUTORIAL/models/xword_phone_models/compare/smp_xword_8mix.sof
*** no configuration file was specified ***
opening the output file: $ISIP_TUTORIAL/sections/s04/s04_02_p05/results.out
processing file 1 (ah_111a): $ISIP_TUTORIA./databases/sof_8k/test/ah_111a.sof
hyp: ONE ONE ONE
score: -9122.6484375 frames: 138
processing file 2 (ah_1a): $ISIP_TUTORIA./databases/sof_8k/test/ah_1a.sof
hyp: ONE
score: -5187.28173828125 frames: 79
.....
As you probably noticed, this experiment took a lot longer than some of
the previous experiments. Most of that extra time was spent loading the
acoustic models. The acoustic model file for this experiment is quite
large since the number of phones has increased. In practical
experiments, a large list of utterances will be decoded. Each utterance
will use the same acoustic models, and the models won't have to load
before each utterance.
Cross-word context-dependent phones are different from word-internal
phones because they consider surrounding words as well as surrounding
phones. The image
to the right illustrates a cross-word context-dependent language model.
This language model has only one word, zero, with two pronounciations.
As you can see in the phone level, all of the phones are triphones.
Since there is just one word in this model, the only sequence
possibilities are a zero followed by another zero, or a zero followed
by silence. The first and last triphone of the word
consider this possiblility. For a language model consisting of many
words, the phone level becomes extremely complex.
|
|
|