/ Recognition / Fundamentals / Production / Tutorials / Software / Home
4.2.4 Network Decoding: Recognition Using Word-Internal Context-Dependent Phones
Section 4.2.4: Recognition Using Context-Dependent Phone Models

The previous section introduced the concept of speech recognition using phones and discussed the concept of context-independent phones. This section will introduce context-dependent phones and discuss how they differ from context-independent phone models.

The experiment below decodes a single utterance using context-dependent phones. Go to the directory $ISIP_TUTORIAL/sections/s04/s04_02_p04/.

    cd $ISIP_TUTORIAL/sections/s04/s04_02_p04/
and run the following command:

    isip_recognize -parameter_file params_decode.sof -list $ISIP_TUTORIA./databases/lists/identifiers_test.sof -verbose ALL
This will produce the following output:
    Command: isip_recognize -parameter_file params_decode.sof -list /ftp/pu./projects/speech/software/tutorials/production/
    fundamentals/current/example./databases/lists/identifiers_test.sof -verbose ALL Version: 1.23 (not released) 2003/05/21 23:10:45 loading audio database: $ISIP_TUTORIA./databases/db/tidigits_audio_db_test.sof *** no symbol graph database file was specified *** *** no transcription database file was specified *** loading front-end: $ISIP_TUTORIAL/recipes/frontend.sof loading language model: $ISIP_TUTORIAL/models/winternal_phone_models/compare/lm_winternal_jsgf_8mix.sof loading statistical model pool: $ISIP_TUTORIAL/models/winternal_phone_models/compare/smp_winternal_8mix.sof *** no configuration file was specified *** opening the output file: $ISIP_TUTORIAL/sections/s04/s04_02_p04/results.out processing file 1 (ah_111a): $ISIP_TUTORIA./databases/sof_8k/test/ah_111a.sof hyp: ONE ONE ONE score: -9122.6484375 frames: 138 processing file 2 (ah_1a): $ISIP_TUTORIA./databases/sof_8k/test/ah_1a.sof hyp: ONE score: -5187.28173828125 frames: 79 .....
Notice that the context_mode parameter of the parameter file has been set to SYMBOL_INTERNAL in order to indicate to the recognizer that the models being used are word internal models. The rest of the parameters are the same as for the previous recognition experiments.
Unlike context-independent phones, each context-dependent phone takes into account its surrounding context. In other words, the system determines how each phone is affected by surrounding phones. Consider the phone 'iy'. In a context-dependent recognition system, this phone will be in the form: <left context - iy - right context>. This form is called a triphone since there are three parts. The phone 'iy' might have any other phone for its left and right context. Consequently, the number of possible phones grows tremedously. In English, there are about 46 context-independent phones. For a context-dependent system, the number of possible triphones becomes 46 x 45 x 45 (93,150). Fortunately, this is an upper bound. In practice, the training of a recognition system will eliminate most of these triphones.

The increased number of phones causes the phone level of the language model to become more complex. The picture to the right illustrates the language model for a context-dependent recognition system. This language model contains only one word, zero, and two pronounciations. The first and last context dependent phones consists only of two parts because the word boundaries are not crossed while considering the contexts for word-internal context-dependent system. This type of phone is called a diphone. In the next section, we will see a type of recognition where the context before and after the word is considered.


   
Table of Contents   Section Contents   Previous Page Up Next Page
      Glossary / Help / Support / Site Map / Contact Us / ISIP Home