Unlike context-independent phones, each context-dependent phone takes
into account its surrounding context. In other words, the system
determines how each phone is affected by surrounding phones. Consider
the phone 'iy'. In a context-dependent recognition system, this phone
will be in the form: <left context - iy - right context>.
This form is called a triphone since there are three parts. The phone
'iy' might have any other phone for its left and right context.
Consequently, the number of possible phones grows tremedously.
In English, there are about 46 context-independent phones. For a
context-dependent system, the number of possible triphones becomes
46 x 45 x 45 (93,150). Fortunately, this is an upper bound. In
practice, the training of a recognition system will eliminate most
of these triphones.
The increased number of phones causes the phone level of the language
model to become more complex. The picture to the right illustrates
the language model for a context-dependent recognition system. This
language model contains only one word, zero, and two pronounciations.
The first and last context dependent phones consists only of two
parts because the word boundaries are not crossed while considering
the contexts for word-internal context-dependent system.
This type of phone is called a diphone. In the next
section, we will see a type of recognition where the context before
and after the word is considered.
|
|